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Release 3.1A John F, Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

k"h_pp protein - protein database search, using Smith-Waterman algorithm 

ron; Fri May 2 



Tabular output not generated 



09:13:09 1999; MasPar time 43.77 Seconds 

916.462 Million cell updates/sec 



MJS-09-19H47-9 

(1-735) from US09191647 . pep 

5438 

1 SNKNLTSFPSRIPFDTTELY TVHIIRQCQCEPTKSVLSEK 735 

PAM 150 
Gap 11 



179066 seqs, 54579741 residues 

processing: Minimum Match 0* 

Listing first 45 summaries 

sptrembl9 

l:sp_archea 2 : spjbacteria 3:sp_fungi 4:sp_human 
5:sp_invertebrate 6:spjiammal 7 : spjnhc 8:sp_organelle 
9:sp_phage 10:sp_plant ll:sp_rodent 12:sp_unclassified 
13 ; sp_vertebrate 14 : sp_virus 



Statistics: Mean 49.268; Variance 89.604; scale 0.550 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
I and is derived by analysis of the total score distribution. 




Query 



SUMMARIES 



Score Match Length 


DB 


ID 


Description 


Pred. No. 


4486 


82.5 


601 


5 


Q20204 


F40E10.4 PROTEIN (FRAG 


0.00e+00 


1651 


30.4 


1531 


11 


088279 


MEGF4, 


O.QQe+00 


1551 


28.5 


1523 


11 


088280 


MEGF5. 


0.00e+00 


1424 


26.2 


739 


4 


075094 


MEGF5 (FRAGMENT). 


9,79e-287 


1188 


21.8 


530 


5 


Q24526 


SLIT LOCUS ENCODING A 


l,98e-232 


578 


10.6 


2653 


5 


025253 


NOTCH HOMOLOG SCALLOPE 


6.53e-95 


570 


10.5 


2447 


13 


013149 


NOTCH 2 (FRAGMENT), 


3.70e-93 


565 


10.4 


529 


5 


Q25058 


FIBROPELLIN IA (FRAGME 


4.60e-92 


559 


10.3 


406 


5 


Q25059 


FIBROPELLIN III (FRAGM 


9.44e-91 


540 


9.9 


1203 


11 


Q06008 


NOTCH PROTEIN HOMOLOG 


l,32e-86 


540 


9.9 


2470 


11 


035516 


CELL SURFACE PROTEIN. 


1.32e-86 


533 


9.8 


728 


13 


Q90656 


TRANSMEMBRANE PROTEIN 


4.42e-85 


528 


9.7 


721 


13 


Q91902 


X-DELTA-1, 


5,41e-84 


524 


9.6 


717 


13 


P87357 


DELTAD TRANSMEMBRANE P 


4,01e-83 


521 


9,6 


832 


5 


Q99108 


NEUROGENIC LOCUS DELTA 


1.80e-82 


522 


9.6 


1476 


13 


090285 


PUTATIVE EXTRACELLULAR 


1.09e-82 


516 


9.5 


752 


13 


042374 


NOTCH RECEPTOR PROTEIN 


2,18e-81 


519 


9.5 


2531 


5 


016004 


NOTCH HOMOLOG. 


4,88e-82 


499 


9.2 


642 


13 


P79941 


NOTCH LIGAND X-DELTA-2 


1.05e-77 


500 


9.2 


802 


13 


057462 


DELTAA. 


6.38e-78 



21 495 S 


.1 2352 5 


061240 


HRNOTCH PROTEIN. 


7,67e 


77 


22 ' 492 S 


.0 615 13 


057409 


DELTAB. 


3.41e 


76 


23 489 E 


.0 1218 4 


Q15816 


TRANSMEMBRANE PROTEIN 


1.51e 


75 


24 489 S 


.0 1218 4 


014902 


TRANSMEMBRANE PROTEIN 


1.51e 


75 


25 487 S 


.0 1218 4 


015122 


JAGGED1, 


4.09e 


75 


26 489 S 


.0 1227 4 


P78504 


JAGGED 1 (TRANSMEMBRAN 


1.51e 


75 


27 477 E 


.8 263 4 


099734 


NOTCH2 TRANSMEMBRANE P 


5.83e 


73 


28 480 E 


.8 1372 5 


P91526 


SIMILARITY TO MULTIPLE 


1.32e 


73 


29 471 E 


.7 1193 13 


Q90819 


C-SERATE-1 PROTEIN (FR 


l,14e 


71 


30 474 E 


.7 1219 11 


Q63722 


JAGGED PROTEIN. 


2.57e 


72 


31 461 E 


.5 1212 13 


042347 


C-SERRATE-2 (FRAGMENT) 


1.59e 


69 


32 463 E 


.5 1964 11 


035442 


NOTCH4, 


5.92e 


70 


33 444 E 


,2 1202 11 


P97607 


JAGGED2 (FRAGMENT), 


6,87e 


66 


34 445 E 


.2 1687 11 


Q61204 


NOTCH2-LIKE (EGF REPEA 


4.20e 


66 


35 447 E 


,2 1722 5 


Q1935Q 


SIMILAR TO EGF -LIKE RE 


1.57e 


66 


36 426 7 


,8 434 11 


055139 


JAGGED2 PROTEIN (FRAGM 


4,66e 


62 


37 422 7 


.8 1999 4 


099940 


NOTCH4. 


3.29e 


61 


38 422 7 


.8 2003 4 


O00306 


NOTCH4. 


3.29e 


61 


39 417 7 


.7 955 4 


Q99466 


NOTCH4 (FRAGMENT). 


3.77e 


60 


40 408 7 


.5 518 11 


070219 


JAGGED 2 (JAGGED 2 PRO 


3,02e 


58 


41 404 7 


,4 383 11 


070534 


ZOG. 


2, He 


57 


42 404 7 


.4 383 11 


Q62779 


PREADIPOCYTE FACTOR 1, 


2, lie 


57 


43 384 7 


.1 387 11 


Q06007 


NOTCH PROTEIN HOMOLOG 


3,37e 


53 


44 378 7 


.0 156 5 


Q26661 


EPIDERMAL GROWTH FACTO 


6.06e 


52 


45 380 7 


.0 473 5 


Q25464 


ADHESIVE PLAQUE MATRIX 


2,31e 


52 



ALIGNMENTS 



1 



OC 



RESULT 

ID Q20204 PRELIMINARY; PRT; 601 AA, 

AC Q20204; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE F40E10.4 PROTEIN (FRAGMENT). 

GN F40E10.4. 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RA SMYER.; 

RL SUBMITTED (FEB-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718. 

RA WILSON R, r AINSCOUGH R., ANDERSON K, , BAYNES C, BERKS M., 

RA BONFIELD J., BURTON J., CONNELL M., COPSEY T, , COOPER J,, COULSON A., 

RA CRAXTON M., DEAR S., DU Z., DURBIN R., FAVELLO A., FULTON L., 

RA GARDNER A., GREEN P., HAWKINS T., HILLIER L., JIER M,, JOHNSTON L., 

RA JONES M., KERSHAW J., KIRSTEN J., LAISTER N,, LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M. , 

RA PARSONS J., PERCY C, RIFKEN L, ROOPRA A., SAUNDERS D., SHOWNKEEN R., 

RA SMALDON N. , SMITH A., SONNHAMMER E., STADEN R. ( SULSTON J., 

RA THIERRY -MIEG J., THOMAS K., VAUDIN M., VAUGHAN K. ( WATERSTON R. , 

RA ' WATSON A., WEINSTOCK L., WILKINSON-SPROAT J., WOHLDMAN P,; 

"2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans,"; 

RL NATURE 368:32-38(1994). 

DR EMBL; Z69792; E1346469; -. 

DR PROSITE; PS01187; EGF.CA; 1, 

KW GLYCOPROTEIN; EGF-LIkI DOMAIN, 

FT NONJER 1 1 

SQ SEQUENCE 601 AA; 66669 MW; F8A72773 CRC32; 

Query Match 82.5%; Score 4486; DB 5; Length 601; 

Best Local Similarity 100.04; Pred. No. Q.QOe+00; 

Matches 601; Conservative 0; Mismatches 0; Indels 0; Gaps ( 

Db 1 IKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTKLATKCDLCLNSPCKNNAIC 60 

IIIIIIIIIIMIIIIIIIillllllllllllllllllllllllllllllllllllllll 
Qy 135 IKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTKLATKCDLCLNSPCKNNAIC 194 



RT 
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Db 61 ETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCyCNKGFEGDYC 120 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Qy 195 ETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCYCNKGFEGDYC 254 

Db 121 EKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEGKHCEDKLEYC 180 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIMI 

Qy 255 EKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEGKHCEDRLEYC 314 
Db 181 TRRLNPCENNGRCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSYDC 240 

IIIIIIIM!!IIIIIII!IIIIIIIII||||||||||||||||||||||||||||!||| 
Qy 315 TKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSYDC 374 

Db 241 LCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTCKCHEGFSGPSCD 300 

MIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIMII 

Qy 375 LCRPGYAGQYCEIPPMMDMEYQKTDACOQSACGQGECVASQNSSDFTCKCHEGFSGPSCD 434 
Db 301 RQMSVGFKNPGAYLALDPLASDGTITMTLRTTSRIGILLYYGDDHFVSAELYDGRVKLVY 360 

lllllMIIIIMIillMIIIIIIIIIMIIIIIIIIIIilllllllllllllllllll 

Qy 435 RQMSVGFKNPGAYLALDPLASDGTITMTLRTTSKIGILLYYGDDHFVSAELYDGRVKLVY 494 
lH 361 YIGNFPASHMYSSVKVNDGLPHRISIRTSERKCFLQIDRNPVQIVENSGKSDQLITKGKE 420 

w IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIII 

Qy 495 YIGNFPASHMYSSVKVNDGLPHRISIRTSERKCFLQIDKNPVQIVENSGKSDQLITKGKE 554 

Db 421 MLY IGGLP I EKSQDAKRRFHVKNSESLKGC I S S I T I NEVP I NLQQALENVNTEQSC SAT V 480 

IIIIIIIIMIKIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIMI 

Qy 555 MLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINEVPINLQQALENVNTEQSCSATV 614 
Db 481 NFCAGIDCGNGRCTNNALSPKGYMCQCDSHFSGEHCDERRIRCDKQKFRRHHIENECRSV 540 

MIIIIIIMIIIIIIIIIIIIIIIIMIIIilllllllllllllllMIIIIIINM 
Qy 615 NFCAGIDCGNGKCTNNALSPRGYMCQCDSHFSGEHCDEKRIKCDRQKFRRHHIENECRSV 674 

Db 541 DRIKIAECNGYCGGEQNCCTAVRKKQRRVRMICRNGTTRISTVHIIRQCQCEPTKSVLSE 600 

IIIMIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIMIIMIIIIIIIIIIIIIIII 
Qy 675 DRIKIAECNGYCGGEQNCCTAVRKKQRKVRMICKNGTTKISTVHIIRQCQCEPTKSVLSE 734 

Db 601 K 601 

I 

Qy 735 K 735 



088279 



PRELIMINARY; PRT; 1531 AA, 



, CREATED) 

, LAST SEQUENCE UPDATE) 
, LAST ANNOTATION UPDATE) 



01-NOV-1998 (TREMBLREL. 
01-NOV-1998 (TREMBLREL, 
01-NOV-1998 (TREMBLREL. 
MEGF4. 
!F4. 

'RATTUS NORVEGICUS (RAT) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS . 

[1] 

SEQUENCE FROM N.A. 

STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA O.; 

"Identification of high-molecular-weight proteins with multiple 

EGF-like motifs by motif -trap screening."; 

GENOMICS 51:27-34(1998). 

EMBL; ABOU530; D1033423; -. 

PROSITE; PS01185; CTCR.l; 1. 

PROSITE; PS01186; EGF_2; 8. 

PROSITE; PS01187; EGF.CA; 2, 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1531 AA; 167497 MW; 5C5EBDF4 CRC32; 



Query Match 30.4%; Score 1651; DB 11; Length 1531; 

Best Local Similarity 39.4%; Pred. No. 0.00e+00; 

Matches 259; Conservative 162; Mismatches 199; Indels 37; Gaps 25; 

Db 748 SNKHLQALPKG I PRNVTELYLDGNQFTLVPGQ ■ LSI FRY - LQLVDLSNNKISSLSNSSFT 805 



llhl ::l II : lllllhl : :|:: |: : I I :|||:|:: II |::|: 
1 SNKNLTSFPSRIPFDTTELYLDANYINEIPAHDLNRL-YSLTKLDLSHNRLISLENNTFS 59 

806 NMSQLTTLILSYNALQCIPPLAFQGLRSLRLLSLHGNDVSTLQEGIFADVTSLSHLAIGA 865 

;::|:IM:|| |:|: Mil || : 1 1 : 1 1 1 ! 1 1 1 : 1 I :: ! : : : 1 1 : : | : | : | : 
60 NLTRLSTLIISYNKLRCLQPLAFNGLNALRILSLHGNDISFLPQSAFSNLTSITHIAVGS 119 

866 NPLYCDCHLRWLSSWVKTGYKEPGIARCAGPPEMEGKLLLTTPARKFECQGPPSLAVQAR 925 

hlllll:: 1:1 |:|: : hlllll I : III: : | |:: : : :| 
120 NSLYCDCNMAWFSRWIKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSRVPTKLATR 179 



Db 926 CDPCLSSPCQNQGTCHNDPLEVYRCTCPSGYKGRNCEVSLDSCSSNPCGNGGTCHAQEGE 985 

II Ihlll hi : MM: :|l :|;| ::l I :|| : 
Qy 180 CDLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQA- 238 

Db 986 DAGFTCSCPSGFEGLTCGMNTDDCVRHDCVNGG V--C— V-D--G--IGNYTCQ 1030 

I I I Mil I :| Nil: I III I : 

Qy 239 GR-FNCYCNRGFEGDYCERNIDDCVNSRCENGGKCVDLVRFCSEELKNFQSFQINSYRCD 297 

Db 1031 CPLQYTGRACEQLVDFCSPDLNPCQHEAQCVGTPEGPRCECVPGYTGDNCSKNQDDCKDH 1090 

ll::| I: II: :::|: lll|:::: I: : I I l|:||:|| I Ml: 
Qy 298 CPMEYEGRHCEDRLEYCTKRLNPCENNGRCIPINGSYSCMCSPGFTGNNCETNIDDCRNV 357 

Db 1091 QCQNGAQCVDEINSYACLCAEGYSGQLCEIPPAPRNSCEGTE-CQ-N--G-ANCVD-QGS 1144 

:lll|: III I II III 1:1 HI : |: II : | ::|| | 
Qy 358 ECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNS 417 

Db 1145 RP-VCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWPRANITLQVSTAEDNGILLYN 1203 

I I 11:11 I:: :|| I : :|| : I : : l|: : |: Mill 
Qy 418 SDFTCRCHEGFSGPSCDRQMSVGFKNPGAYLALDPLAS-D-GTITMTLRTTSKIGILLYY 475 

Db 1204 GDNDHIAVELYQGHVRVSYDPGSYPSSAIYSAETINDGQFHTVELVTFDQMVNLSIDGGS 1263 

II: :: llhhh: I |::|:| :||: :||| | : : | ::: I : 
Qy 476 GDDHFVSAELYDGRVRLVYYIGNFPASHMYSSVRVNDGLPHRISIRTSERRCFLQIDRNP 535 

Db 1264 PMTMDNFGKHYTLNSEA-P-LYVGGMPVDVNSAAFRLWQILNGTSFHGCIRNLYINN-" 1318 

::| II I : : ll:||:|:: : II :: |: |: III :: ||: 
Qy 536 VQIVENSGRSDQLITRGKEMLYIGGLPIERSQDARRRFHVRNSESLKGCISSITINEVPI 595 

Db 1319 ELQD-FTKTQMKPGWPGCEPCRKLYCLHGICQPNA-TP-GPVCHCEAGWGGLHCDQ 1372 

:lh : : : : : I : I : I I 1 1 : 1 I :|:|:: :| |||:' 
Qy 596 NLQQALENVNTEQSCSATVNFCAGIDCGNGKCTNNALSPKGYMCQCDSHFSGEHCDE 652 



088280 



PRELIMINARY; 



PRT; 1523 AA. 



)8, CREATED) 

)8, LAST SEQUENCE UPDATE) 
)8, LAST ANNOTATION UPDATE) 



01-NOV-1998 (TREMBLREL. ( 
01-NOV-1998 (TREMBLREL. ( 
01-NOV-1998 (TREMBLREL, ( 
MEGF5. 
MEGF5, 

RATTUS NORVEGICUS (RAT), 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

[1] 

SEQUENCE FROM N.A. 

STRAIN-SPRAGUE-DAWLEY; TISSUE=BRAIN; 
MEDLINE; 98360089, 

NARAYAMA H., NARAJIMA D. , NAGASE T., NOMURA N. , SERI N,, OHARA O.; 
"Identification of high-molecular-weight proteins with multiple 
EGF-like motifs by motif-trap screening.",- 
GENOMICS 51:27-34(1998). 
EMBL; AB011531; D1033424; 



PROSITE; PS01185; 
PROSITE; 

PROSITE; PS01187; 
GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1523 AA; 167767 MW; 



CTCR_1; 1. 
.2; 7. 
CA; 2. 



Query Match 28,5%; 
Best Local Similarity 36.7%; 
Matches 241; Conservative 



Score 1551; DB 11; 
Pred. No, 0.00e+00; 
173; Mismatches 209; 



Length 1523; 
Indels 34; Gaps 17; 
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Db 739 SNRGLHTLPKGMPKDVTELYLEGNHLTAVP-KELSTFRQLTLIDLSNNSISMLTNHTFSN 797 

II: I ::l :| I lllll::! : :l :|: : II :lll:l : I Mill 
Qy 1 SNRNLTSFPSRIPFDTTELYLDANYINEIPAHDLNRLYSLTRLDLSHNRLISLENNTFSN 60 

Db 798 MSHLSTLILSYNRLRCIPVHAFNGLRSLRVLTLHGNDISSVPEGSFNDLTSLSHLALGIN 857 

:::|I!II:|!|:IM: III :||:|:|||||M : I : : : I : : 1 1 1 : : I : I : I 
Qy 61 LTRLSTLUSYNKLRCLQPLAFNGLNALRILSLHGNDISFLPQSAFSNLTSITHIAVGSN 120 

Db 858 PLHCDCSLRWLSEWIRAGYREPGIARCSSPESMADRLLLTTPTHRFQCKGPVDINIVAKC 917 

:| III:: hi III: : Mill |::::::||||: :| I : I :: :|| 
Qy 121 SLYCDCNMAWFSKWIKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSRVPTKLATKC 180 

Db 918 NACLSSPCKNNGTCSQDPVEQYRCTCPYSYKGKDCTVPINTCVQNPCQHGGTCHLSESHR 977 

: MINN: I : I I I :: I I |::| :|| : :|| :::: | 
Qy 181 DLCLNSPCKNNAICETTSSRRYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 240 

P 978 DGFSCSCPLGFEGQRCEINPDDCEDNDCENSATCVD G--INNYACVC 1022 
hi I lllh II I III :: III:: III : Ihl I I 

241 --FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEEIKNFQSFQINSYRCDC 298 

Db 1023 PPNYTGELCDEVIDYCVPEMNLCQHEAKCISLDKGFRCECVPGYSGKLCETDNDDCVAHK 1082 

I :| I h: "II :| |::::|||::: :: I I ||::|: III: III 
Qy 299 PMEYEGKHCEDRLEYCTKRLNPCENNGRCIPINGSYSCMCSPGFTGNNCETNIDDCKNVE 358 

Db 1083 CRHGAQCVDAVNGYTCICPQGFSGLFCEHPPPMVL-LQ-TSPCDQYECQNGAQCIWQQE 1140 

h:h III:: :| hi |::| :|| II I : II :|:| I I :|: I 
Qy 359 CQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQG-ECVASQNS 417 

Db 1141 P--TCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVRPQANISLQVATDRDNGILLYK 1198 

: Ihl Ihll h: ::| I :|: I : : ::: h: : I lllll 
Qy 418 SDFTCRCHEGFSGPSCDRQMSVGFRNPGAYIALDP--IASDGTITMTLRTTSKIGILLYY 475 

Db 1199 GDNDPLALELYQGHVRLVYDSLSSPPTTVYSVETVNDGQFHSVELVMLNQTLNLWDRGA 1258 

II: :: llhhhlll : h: :|| III! I : : :: I :|| : 
Qy 476 GDDHFVSAELYMRVKLVYYIGNFPASHMYSSVKVNDGLPHRISIRTSERKCFLQIDRNP 535 

Db 1259 PKSLGKLQK-QPAVGINSP-LYLGGIPTSTGLSALRQGADRPLGGFHGCIHEVRINNELQ 1316 

: : I : : Ihlhl : I h : :: III : lh 
Qy 536 VQIVENSGRSDQLITRGREMLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINEVPI 595 

Db 1317 DFK - ALPP -QS -LGVSPGCKSCT - V-CRHGLC " RSVEKDSWCECHPGWTGPLCDQ 1366 

:: II : : h : h : I :| I :: : :|:| : :| lh 
Qy 596 NLQQALENVNT EQSCSATVNFCAG IDCGNG KCT NNALSPKGYMCQCDSHFSGEHCDE 652 

»T 4 
075094 PRELIMINARY; PRT; 739 AA, 

AC 075094; 

DT 01-NOV-1998 (IREMBLREL, 08, CREATED) 

DT Ql-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF5 (FRAGMENT). 

GN MEGF5, 

OS HOMO SAPIENS (HUMAN) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N, A, 

RC TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA 0.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif -trap screening,"; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011538; D1033429; -. 

DR PROSITE; PS01185; CTCKJ; 1. 

DR PROSITE; PS01186; EGFJ; 7. 

DR PROSITE; PS01187; EGF_CA; 2. 

RW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 1 1 

SQ SEQUENCE 739 AA; 80364 MW; ' DC6BCB63 CRC32; 



Query Match 26.2%; Score 1424; DB 4; Length 739; 

Best Local Similarity 37.0%; Pred. No. 9.79e-287; 

Matches 226; Conservative 152; Mismatches 199; Indels 34; Gaps 17; 

Db 6 LTNYTFSNMSHLSTLILSYNRLRCIPVHAFNGLRSLRVLTLHGNDISSVPEGSFNDLTSL 65 

I I lllh::Mllhllhllh lllll :|hhlllllll : I : : : I : : I If : 
Qy 53 LENNTFSNLTRLSTLIISYNKLRCLQPLAFNGLNALRILSLHGNDISFLPQSAFSNLTSI 112 

Db 66 SHLALGTNPLHCDCSLRWLSEWVKAGYKEPGIARCSSPEPMADRLLLTTPTHRFQCKGPV 125 

:|:hl:hl llh: hi hh : hlllll h ::::||||: :| I : I 
Qy 113 THIAVGSNSLYCDCNMAWFSKWIKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKV 172 

Db 126 DINIVAKCNACLSSPCKNNGTCTQDPVELYRCACPYSYKGKDCTVPINTCIQNPCQHGGT 185 

:: :lh Ihlllllh I : I I I :: I I |::| :|| : :| 
Qy 173 PTKLATKCDLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNAT 232 

Db 186 CHLSDSHKDGFSCSCPLGFEGQRCEINPDDCEDNDCENNATCVD G-- 230 

I :::: : hi I lllh II I III :: III : III : 
Qy 233 CKVAQAGR-FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQ 290 

Db 231 INNYVCICPPNYTGELCDEVIDHCVPELNLCQHEAKCIPLDKGFSCECVPGYSGKLCETD 290 

Ihl I II :| I h: :: I II h:::lllh: ::ll I lh:h III: 
Qy 291 INSYRCDCPMEYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETN 350 

Db 291 NDDCVAHKCRHGAQCVDTINGYTCTCPQGFSGPFCEHPPPMVL-LQ-TSPCDQYECQNGA 348 

III h:h III MM M : 1 1 i I I : II :|:| I I 
Qy 351 IDDCKNVECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQG- 409 

Db 349 QCIWQQEP--TCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVRPQANISLQVATDK 406 

:|: I : Ihl Ihll h: ::| I :|: I : : ::: h: : I 
Qy 410 ECVASQNSSDFTCKCHEGFSGPSCDRQMSVGFKNPGAYLALDP-LASDGTITMTLRTTS 467 

Db 407 DNGILLYKGDNDPLALELYQGHVRLVYDSLSSPPTTVYSVETVNDGQFHSVELVTLNQTL 466 

lllll II: :: llhhhlll : M M UN MM:: 
Qy 468 KIGILLYYGDDHFVSAELYDGRVKLVYYIGNFPASHMYSSVKVNDGLPHRISIRTSERKC 527 

Db 467 NLWDKGTPKSLGKLQK-QPAVGINSP-LYLGGIPTSTGLSALRQGTDRPLGGFHGCIHE 524 

Mil : : I : : Ihlhl : I h : :: III 
Qy 528 FLQIDKNPVQIVENSGKSDQLITKGREMLYIGGLPIEKSQDAKRRFHVKNSESLKGCISS 587 

Db 525 VRINNELQDFR-ALPP-QS-LGVSPGCKSCT-V-CKHGLC-RSVERDSWCECRPGWTG 577 

: lh :: II : : I : : I : M : I I : : : :|:| : :| 
Qy 588 ITINEVPINLQQALENVNTEQSCSATVNFCAGIDCGNGKCTNNALSPKGYMCQCDSHFSG 647 

Db 578 PLCDQEARDPC 588 

lllll 
Qy 648 EHCD-EKRIKC 657 



RESULT 5 

ID Q24526 PRELIMINARY; PRT; . 530 AA. 

AC Q24526; 

DT 01-KJV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE SLIT LOCUS ENCODING A PROTEIN ASSOCIATED WITH NEURAL DEVELOPMENT WITH 

DE 52D EGF HOMOLOGOUS DOMAINS (FRAGMENT). 

OS DROSOPHILA MELANOG ASTER (FRUIT FLY) . 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; .INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-CANTON S; 

RX MEDLINE; 89077533. 

RA ROTHBERG J.M., HARTLEY D.A., WALT HER Z., ARTAVANIS - TSARONAS S.; 

RT "slit: an EGF-homologous locus of D. melanogaster involved in the 

RT development of the embryonic central nervous system."; 

RL CELL 55:1047-1059(1988). 

DR EMBL; M23543; G514357; -. 

DR FLYBASE; FBgn0003425; sli. 

DR PROSITE; PS01186; EGFJ; 5. 

DR PROSITE; PS01187; EGF_CA; 2, 
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16; 



DR PFAM; PFQ0008; EGF; 7. 

DR PFAM; PF00054; laminin_G; 1. 

KW NEUROGENESIS; GLYCOPROTEIN; EGF -LIKE DOMAIN, 

FT NONJER 530 530 

SQ SEQUENCE 530 AA; 59457 MW; 10E5764D CRC32; 

Query Match 21.8%; Score 1188; DB 5; Length 530; 

Best Local Similarity 36,9%; Pred. No, 1.98e-232; 
Matches 171; Conservative 111; Mismatches 159; Indels 22; 

Db 1 MKDKLILSTPSSSFVCRGRVRNDIIAKCNACFEQPCQNQAQCVALPQREYQCLCQPGYHG 60 

: : = : : I I "I : :||: I : : 1 1 I I I : : I I I I 1 1 : I 
Qy 153 VSNQLILTAQPYQFTCDSKVPTKLATKCDLCLNSPCKNNAICETTSSRKYTCNCTPGFYG 212 

Db 61 KHCEFMIDAC YGNPCRNNATCTVLEEGRFSCQCAPGYTGARCET NIDDCL - G ■ - EI - - KC 115 

Ml MIMhll Mil! I : 11:1 I |: I II Mill: : I 
Qy 213 VHCENQIDACYGSPCLNNATCKVAQAGRFNCYCNKGFEGDYCEKNIDDCVNSKCENGGKC 272 

Db 116 ONNAT-C — I-D- -G- -VESYKCECQPGFSGEFCDTKIQFCSPEFNPCANGAKCMDHFT 166 
: I : : : ::||:|:| : I |: |:::|: :||| | :||: 
273 VDLVRFCSEELKNFQSFQINSYRCDCPMEYEGKHCEDKLEYCTKKLNPCENNGKCIPING 332 



LW 167 HYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMIS 226 

Ml I Ml I II Mill | IIIMIill |: | |:| Ml |: 
Qy 333 SYSCMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMD 392 

Db 227 MMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELE 286 

I I I :M I :| I :: :||: |:|| |::| |: |::| : ::: |: 
Qy 393 MEYQKTDACQQSACGQGECV-ASQNSSDFTCKCHEGFSGPSCDRQMSVGFKNPGAYLALD 451 

Db 287 PLRTRPEANVTIVF-SSGQNGILHYDGQDAHLAVELFNGRIRVSYDVGNHPVSTMYSFEM 345 

M : ::: :|: : ::: 1 1 1 : 1 Ml :: ||::||::: I :|| I I III : 
Qy 452 PL-A-SDGTITMTLRTTSKIGILLYYGDDHFVSAELYDGRVRLVYYIGNFPASHMYSSVK 509 

Db 346 VADGKYHAVEL-LAIKKNFTLRVDRGLARSIINEGSNDYLKLT-TPM-FLGGLPVDPAQQ 402 

Ml I : : : M I |::|: : : I I :| I I ::||||:: :|: 
Qy 510 VNDGLPHRISIRTSERKCF-LQIDKNPVQIVENSGKSDQLITKGKEMLYIGGLPIEKSQD 568 

Db 403 AYKNWQIRNLTSFKGCMKEVWINHKLVDFGNAQRQQKITPGCA 445 

I : :::| |:|||: : II ::: I : :|: 
Qy 569 AKRRFHVRNSESLKGCISSITINEVPINLQQALENVNTEQSCS 611 



RESULT 6 

ID Q25253 PRELIMINARY; PRT; 2653 AA, 

AC Q25253; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

»01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
NOTCH HOMOLOG SCALLOPED WINGS (SCL). 
SCL. 

OS LOCHIA CUPRINA (GREENBOTTLE FLY) (AUSTRALIAN SHEEP BLOWFLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; OESTROIDEA; CALLIPHORIDAE; 

OC LUC ILIA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=SS SEEKING; 

RX MEDLINE; 96400928, 

RA DAVIES A.G., GAME A.Y., CHEN Z., WILLIAMS T.J., GOODALL S., YEN J.L., 

RA MCKENZIE J. A., BATTERHAM P.; 

RT "Scalloped wings is the Lucilia cuprina Notch homologue and a 

RT candidate for the modifier of fitness and asymmetry of diazinon 

RT resistance."; 

RL GENETICS 143:1321-1337(1996). 

RN [2] 

RP SEQUENCE OF 39-265 FROM N.A. 

RC STRAIN=SS SEEKING; 

RA CHEN I., NEWSOME T., MCKENZIE J. A., BATTERHAM P.; 

RL SUBMITTED (DEC-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [3] 

RP SEQUENCE OF 39-265 FROM N.A-. 

RC STRAIN=SS SEEKING; 



RA CHEN Z., MCKENZIE J, A,, BATTERHAM P.; 

RL SUBMITTED (NOV-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; 058977; G1389670; -. 

DR EMBL; AF032672; G2654074; -. 

DR EMBL; AF032670; G2654074; JOINED. 

DR EMBL; AF032671; G2654074; JOINED. 

DR EMBL; AF032673; G2654075; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS01186; EGF_2; 28. 

DR PROSITE; PS01187; EGF_CA; 21. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

2653 AA; 285928 MW; 8F35FD2D CRC32; 



Query Match 10,64; 
Best Local Similarity 39.7%; 
Matches 100; Conservative 



Score 578; DB 5; Length 2653; 
Pred. No. 6.53e-95; 

50; Mismatches 75; Indels 27; Gaps 15; 



Db 662 CHSNPCNNGATC-IDGINKYTCQCVPGFTGVHCEININECASNPCANNGVCMDLVNG-YK 719 

I ::IMI I I : MM I Ml Mill I: I ::|| II: h I :: 
Qy 183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFN 242 

Db 720 CECPRGFYDPRCLSDVDECASNPCINGGRC-E DGI-H-E-F-I— -CHCPPG? 764 

I I :M I ::hl :: I llhl : : : I : I I III 
Qy 243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEY 302 

Db 765 GGKRCENDIDECSS-NPCQHGGFCVDELNAFKCQCMPGYTGLKCETNIDDCINNPCANG 822 

l|:M: :: I: III:; I |: :: I I 1 1 : 1 1 Mlllllll I I II 
Qy 303 EGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNG 362 

Db 823 GTCIDKVNGYKCVCKVPYTGQDCE- ■ SKLD- PCA - TNRCRNDA- - K - -CTPSPNFLDFSC 874 

1:1:1 = :l MM MM II :::| |: |: I | :| | ||:| 
Qy 363 GSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTC 422 

Db 875 TCKLGYTGRYCD 886 

I MM II 
Qy 423 KCHEGFSGPSCD 434 



PRT; 2447 AA. 



RESULT 7 

ID 013149 PRELIMINARY; 

AC 013149; 

DT 01-JUL-1997 (TREMBLREL. 04, CREATED) 

DT 01-JUL-1997 (TREMBLREL. 04, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH 2 (FRAGMENT), 

OS FUGU RUBRIPES (JAPANESE PUFFERFISH) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; ACANTHOPTERYGII; PERCOMORPHA; 

OC TETRAODONTIFORMES; TETRAODONTOIDEI; TETRAODONTIDAE; FUGU, 

RN [1] 

RP SEQUENCE FROM N.A. 

RA NAKAMURA T., TROWSDALE J,; 

RL SUBMITTED (JUN-1997) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; ABQQ4829; D1021371; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS01186; EGF_2; 29. 

DR PROSITE; PS01187; EGF CA; 20. 

DR PFAM; PF00008; EGF; 35, 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

2447 AA; 262542 MW; 3CDA4F7A CRC32; 



Query Match 10,5%; Score 570; DB 13; Length 2447; 

Best Local Similarity 39.1%; Pred. No. 3.70e-93; 

Matches 101; Conservative 49; Mismatches 79; Indels 29; Gaps 18; 

Db 713 DECLPNPCQNGGSC-LDRHNGFTCVCQAGYRGVNCEKNIDECTSGPCLNQGIC-IDGLNS 770 
I II Ml I : I :|i I :|: INI: II I ::|||| : I : 
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Qy 181 DLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 240 

Db 771 YTCQCVPPFAGEHCEVELDPCSSRPCQRGGVC--L-P--SAD— Y-TY— -FTCRCPA 817 

: I I I I: II ::| I : I: II I I I : : :: : | || 
Qy 241 FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPM 300 

Db 818 GWQGLHCSEDVNEC-KK-NPCRNGGHCINSPGSYICKCPSGYSGHNCQTDIDDCSPNPCL 875 

:| II : :: I II III I I II III I : I : : I : : 1 : 1 1 : 1 : 1 1 1 1 I 
Qy 301 EYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFIGNNCETNIDDCKNVECQ 360 

Db 876 NGGSCVDDVGSFSCECRPGFEGEHCEIEA--D-ECA-SQPCRNGAIC-R-DYV---NS" 924 

lllllll : I: I Mil: I: III : II :::|: :| I : : I II 
Qy 361 NGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSA-CGQGECVASQNSSD 419 

Db 925 FVCECRLG FDG I LCDHN I 942 

I I I: II I II: : 
Qy 420 FTCKCHEGFSGPSCDRQM 437 



Hsult 8 

ID Q25058 PRELIMINARY; PRI; 529 AA, 

AC Q25058; 

DT ' 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN IA (FRAGMENT). 

OS HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN). 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

OC ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS. 

RN [1] 

RP SEQUENCE FROM N'.A. 

RA BISGROVE B.W.; 

RL SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; L33861; G499686; -. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01186; EGF_2; 10. 

DR PROSITE; PS01187; EGF CA; 7. 

DR PFAM; PF00008; EGF; 10. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 529 AA; 55543 MW; 6385F322 CRC32; 

Query Match 10.4%; Score 565; DB 5; Length 529; 

Best Local Similarity 39.0%; Pred. No. 4 . 60e-92; 

Matches 99; Conservative 47; Mismatches 79; Indels 29; Gaps 17; 

•99 DECASSPCQNGALC-VDQVNGYVCFCLPGFSGVHCETDIDECASSPCLNGGQC-INRINS 156 
I I Mil Ihl III III Mill :H I MINI : I : : 

Qy 181 DLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 240 

Db 157 YECVCAAGFNGVNCQTNIDECASDPCENGG I —C— -I- A- -G- -VHGITCNCAS 201 

::| I 11:1 h llhl : Hill : I : : :|:| |:|: 
Qy 241 FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPM 300 

Db 202 GYTGTNCETEIDECAS-M-PCLNGGQCIEMVNGYTCQCAAGFTGVLCETDIDECASDPCQ 259 

I I :ll :: I: : II I I II : :|:| |::|||| llhlhl : II 
Qy 301 EYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQ 360 

Db 260 NGGVCTDTVNGYICSCVQGFTGSDCETN--IN-ECA-SGPCQ-N--G-GTCVDGVNG-F 309 

III I I : H I I |::| II :: I : : 1 1 : I I 1 1 : I : I 
Qy 361 NGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDF 420 

Db 310 VCQCPPNYTGTYCE 323 

I I ::| I: 
Qy 421 TCKCHEGFSGPSCD 434 



RESULT 9 

ID Q25059 PRELIMINARY; PRT; 406 AA. 
AC Q25059; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, 'LAST ANNOTATION UPDATE) 



DE FIBROPELLIN III (FRAGMENT). 

OS HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN) . 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

OC ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA BISGROVE B.W.; 

RL SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; L33862; G499688; -. 

DR PROSITE; PS00577; AVIDIN; 1, 

DR PROSITE; PS01186; EGF J; 6. 

DR PROSITE; PS01187; EGF_CA; 5. 

DR PFAM; PF00008; EGF; 7, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 406 AA; 43475 MW; 45E6EE2C CRC32; 

Query Match 10.3%; Score 559; DB 5; Length 406; 

Best Local Similarity 38.3%; Pred, No. 9.44e-91; 

Matches 98; Conservative. 54; Mismatches 75; Indels 29; Gaps 17; 

Db 16 CNPNPCQNGAAC - IDQVNDYECICPPGFTGDNCETDIDVCASAPCRNGGAC -VDGVNGYT 73 

I :H I I I I I I III I Ml :|| I ::|| I ::| I : 
Qy 183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFN 242 

Db 74 CNCIPGFDGDNCENNINECASNPCQNGGVCID G--V-H--GP-V — CTCQPGY 118 

I I 11:11 ll:M::| :: Ml |:| : : | :| : | | | 
Qy 243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEY 302 

Db 119 TGTLCETDIDECAS--NPCQNGGVCTDLVNMYTCDCLAGFTGSNCETNINECASNPCLNG 176 

I II - I: llhl II: M I : II 1 1 : 1 1 1 1 1 1 : : I : I II 
Qy 303 EGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNG 362 

Db 177 GACVDGVNGYVCQCLPNYTGTHCEIS--LDV-CQSM-PCQ-N--G-ATC-TN-VGGDYSC 226 

hllll: :| I I I hi llh :|: I :|| : I : I :: ::|::| 
Qy 363 GSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTC 422 

Db 227 ECPPGYTGINCEIDIN 242 

I h:| :h ::: 
Qy 423 KCHEGFSGPSCDRQMS 438 



RESULT 10 

ID Q06008 PRELIMINARY; PRT; 1203 AA. 

AC Q06008; . 

DT 01-NOV-1996 (TREMBLREL, 01, CREATED) 

DT 01-HOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH PROTEIN HOMOLOG 2 (MOTCH B PROTEIN) (FRAGMENT). 

GN NOTCH2 OR MOTCH B. 

OS MUS MOSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=F1 (CBA X C57BL); TISSUE'WHOLE EMBRYO; 

RX MEDLINE; 93178563, 

RA LARDELLI M., LENDAHL U.; 

RT "Motch A and motch B-two mouse Notch homologues coexpressed in a 

rt wide variety of tissues."; 

RL EXP. CELL RES. 204:364-372(1993). 

DR EMBL; X68279; G287990; -, 

DR MGD; MGI: 97364; NOTCH2 . 

DR PFAM; PF00008; EGF; 27, 

DR PFAM; PF00066; notch; 1. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT . 

FT NONJER ' 1 1 

FT NONJER 1203 1203 

SQ SEQUENCE 1203 AA; 128982 MW; A5A95551 CRC32; 

Query Match 9.9%; Score 540; DB 11; Length 1203; 

Best Local Similarity 38,8%; Pred. No. 1.32e-86; 

Matches 80; Conservative 50; Mismatches 57; Indels 19; Gaps 6; 
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Db 558 DECISKPCMNNGVCHNTQGS-YVCECPPGFSGMDCEEDINDCLANPCONGGSC-VDHVNT 615 

I I- 1:11:: I : | |:| ||| |: ||::|: | ::|| I ::| | : 
Qy 181 DLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCCTAQAGR 240 

Db 616 FSCQCHPGFIGDKCQTDMNECLSEPCKNGGTC SD Y-VNSYTCTCPA 660 

hi I: II II I: ::::!:: I III I |: : :||| I || 
Oy 241 FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPM 300 

Db 661 GFHGVHCENNIDECTE--SSCFNGGTCVDGINSFSCLCPVGFTGPFCLHDINECSSNPCL 718 

: I III:": II ::| I I I: hlhh Mil I :|::| : I 
Oy 301 EYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQ 360 

Db 719 NAGTCVDGLGTYRCICPLGYTGKNCQ 744 

1:1:1111: :| hi I: |: 
Qy . 361 NGGSCVDGILSYDCLCRPGYAGQYCE 386 



RESULT 11 

ID 035516 PRELIMINARY; PRT; 2470 AA. 

• 035516; 
01-JAN-1998 (TREMBLREL. 05, CREATED) 
01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 
DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
DE CELL SURFACE PROTEIN. 
GN NOTCH2. 

OS MUS MUSCULUS (MOUSE) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 
OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 
RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57B/6; TISSUE-THYMUS; 

RX MEDLINE; 93178563. 

RA LARDELLI M,, LENDAHL U.; 

RT "Motch A and motch B-two mouse Notch homologues coexpressed in a 

RT vide variety of tissues."; 

RL EXP, CELL RES. 204:364-372(1993), 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57B/6; TISSUE-THYMUS; 

RA HAMADA Y., HIGUCHI M., TSUJIMOTO Y.; 

RL SUBMITTED (JUL-1994) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; D32210; D1022953; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22, 

DR PROSITE; PS01186; EGFJ; 27. 

DR PROSITE; PS01187; EGF.CA; 22. 

DR PFAM; PF00008; EGF; 34. 

DR PFAM; PF00023; ank; 6. 

• PFAM; PF00066; notch; 2. 
'GLYCOPROTEIN; EGF-LIKE DOMAIN, 
SEQUENCE 2470 AA; 265325 MW; CA94E03A CRC32; 

Query Match 9,9%; Score 540; DB 11; Length 2470; 

Best Local Similarity 38.8%; Pred. No. 1.32e-86; 

Matches 80; Conservative 50; Mismatches 57; Indels 19; Gaps 6; 

Db 873 DECISKPCMNNGVCHNTQGS-YVCECPPGFSGMDCEEDINDCLANPCQNGGSC-VDHVNT 930 

I I:: ll:ll::| I : I |:| III |: ||::|: I ::|| I ::| | : 
Qy 181 DLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 240 

Db 931 FSCQCHPGFIGDKCQTDMNECLSEPCKNGGTC SD Y-VNSYTCTCPA 975 

1:1 I: II II I: ::::|:: I III I |: : :||| I II 
Qy 241 FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELRNFQSFQINSYRCDCPM 300 

Db 976 GFHGVHCENNIDECTE--SSCFNGGTCVDGINSFSCLCPVGFTGPFCLHDINECSSNPCL 1033 

: I III":: II ::| I I |: |:||:|: llll I :|::| : I 
Qy 301 EYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQ 360 

Db 1034 NAGTCVDGLGTYRCICPLGYTGKNCQ 1059 

1:1:1111: :| |:| l|:| |: 
Qy 361 NGGSCVDGILSYDCLCRPGYAGQYCE 386 



RESULT 12 ' 

ID Q90656 PRELIMINARY; PRT; 728 AA. 

AC Q90656; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) ' 

DE TRANSMEMBRANE PROTEIN C -DELTA- 1. 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLI FORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-SPINAL CORD; 

RX MEDLINE; 95319507. 

RA HENRIQUE D., ADAM J., MYAT A., CHITNIS A., LEWIS J., ISH-HOROWICZ D.; 

RT "Expression of a Delta homologue in prospective neurons in the 

RT chick."; 

RL NATURE 375:787-790(1995). 

DR EMBL; U26590; G882412; -. 

DR PROSITE; PS01186; EGFJ; 8, 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PFAM; PF00008; EGF; 6. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 728 AA; 79861 MW; 7439F575 CRC32; 

Query Match 9.8%; Score 533; DB 13; Length 728; 

Best Local Similarity 38,7%; Pred, No, 4.42e-85; 

Matches 79; Conservative 40; Mismatches 68; Indels 17; Gaps 12; 

Db 302 HKPCKNGATCTNTGQGSYTCSCRPGYTGSSCEIEINECDANPCKNGGSCTDLENS-YSCT 360 

: llll I I I: llhl lh I II :|: I ::|| I ::| : : ::| 
Qy 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 361 CPPGFYGKNCE-LSA-M-T-CADGP-CFNGGR-CTD--N--P-D-GGYSCRCPLGYSG 406' 

I II I II : I: lh: I : : :| I ||: I I 

Qy 245 CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 304 

Db 407 FNCEKKIDYCSS - -SPCANGAQCVDLGNSYICQCQAGFTGRHCDDNVDDCASFPCVNGGT 464 

:H l::lh :|| I : I: : II I I :|||| :|: hill : I III: 
Qy 305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGS 364 

Db 465 CQDGVNDYSCTCPPGYNGKNCSTP 488 

III: I I I III I I I 
Qy 365 CVDGILSYDCLCRPGYAGQYCEIP 388 



RESULT 13 

ID Q91902 PRELIMINARY; PRT; 721 AA, 

AC Q91902; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE X-DELTA-1. 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 95319507. 

RA HENRIQUE D., ADAM J., MYAT A., CHITNIS A., LEWIS J,, ISH-HOROWICZ D.; 

RT "Expression of a Delta homologue in prospective neurons in the 

RT chick."; 

RL NATURE 375:787-790(1995). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 95319503. 

RA CHITNIS A.B., HENRIQUE D., LEWIS J., ISH-HOROWICZ D., KINTNER C.R.; ' 

RT "Primary neurogenesis in Xenopus embryos regulated by a homologue of 

RT the Drosophila neurogenic gene Delta."; 

RL NATURE 375:761-766(1995), 

DR EMBL; L42229; G807696; -. 

DR PROSITE; PS01186; EGF 2; 8. 

DR PROSITE; PS01187; EGF_CA; 2. 
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DR PFAM; PF00008; EGF; 6. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 721 AA; 79922 MW; 028040EF CRC32; 

Query Match 9.7%; Score 528; DB 13; Length 721; 

Best Local Similarity 39.2*; Pred. No. 5.41e-84; 

Matches 80; Conservative 39; Mismatches 68; Indels 17; Gaps 12; 

Db 297 HKPCENGATCTNTGQGSYTCSCRPGYTGSNCEIEVNECDANPCKNGGSCSDLENS-YTCS 355 

: II I I I I: llhl lh I :ll ::: I ::ll I ::| : : : I 
Qy 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 356 CPPGFYGKNCE-LSA-M-T-CADGP-CFNGGR-CAD—N-P-D-GGYICFCPVGYSG 401 

I II I II : : : I :| I : I I:: I : : :|| ||: | | 
Qy 245 CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 304 

*402 FNCEKKIDYCSS--NPCANGARCEDLGNSYICQCQEGFSGRNCDDNLDDCTSFPCQNGGT 459 
■W l::ll: III I ::| : II I I Ihl lh hill : llllh 
305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGS 364 

Db 460 CQDGINDYSCTCPPGYIGKNCSMP 483 

I III I I I III I I :| 
Qy 365 CVDGILSYDCLCRPGYAGQYCEIP 388 



RESULT 14 

ID P87357 PRELIMINARY; PRT; 717 AA, 

AC P87357; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE DELTAD TRANSMEMBRANE PROTEIN PRECURSOR. 

GN DELTAD. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 97346722. 

RA DORNSEIFER P., TAKKE C, CAMPOS-ORTEGA J. A.; 

rt "Overexpression of a zebrafish homologue of the Drosophila neurogenic 
RT gene Delta perturbs differentiation of primary neurons and somite 
RT development/; 
RL MECH. DEV. 63:159-171(1997). 

• EMBL; Y11760; E307461; -. 
PROSITE; PS01186; EGF J; 8, 
PROSITE; PS01187; EGF CA; 2. 
DR PFAM; PF00008; EGF; 6. 

KW SIGNAL; TRANSMEMBRANE; GLYCOPROTEIN; EGF-LIKE DOMAIN. 
FT SIGNAL 1 19 POTENTIAL. 

FT CHAIN 20 717 DELTAD TRANSMEMBRANE PROTEIN. 

SQ SEQUENCE 717 AA; 79061 MW; 5CC32ECA CRC32; 

Query Match 9.6%; Score 524; DB 13; Length 717; 

Best Local Similarity 40.31; Pred. No. 4.01e-83; 

Matches 81; Conservative 37; Mismatches 66; Indels 17; Gaps 12; 

' Db 293 HKPCQNGATCTNTGQGSYTCSCRPGFTGDSCEIEVNECSGSPCRNGGSCTDLENT • YSCT 351 

Ml I II h llhl III I II ::: I INI I ::| : ::| 
Qy ' 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 352 CPPGFYGRNCE--LSA-M-T-CADGP-CFNGGH-CAD---N--P-E-GGYFCQCPMGYAG 397 

I II I II : : : | :| | : : |:: | : : :| |:||| | | 
Qy 245 CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 304 

Db 398 FNCEKKIDHCSS ■ -NPCSNDAQCLDLVDSYLCQCPEGFTGTHCEDNIDECATYPCQNGGT 455 

= 11 h: h III h: h : II I h llll :|| |||:| Mil: 
Qy 305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGS 364 

Db 456 CQDGLSDYTCTCPPGYTGKNC 476 

I lh III llhl I 
Qy 365 CVDGILSYDCLCRPGYAGQYC 385 



RESULT 15 

ID Q99108 PRELIMINARY; PRT; 832 AA. 

AC Q99108; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS DELTA PROTEIN PRECURSOR (VERSION 2). 

GN DL. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-OREGON-R; TISSUE-EMBRYO; 

RX MEDLINE; 89196890. 

RA KOPCZYNSKI C.C., ALTON A.K., FECHTEL K. , KOOH P.J., 

RA MUSKAVITCH M.A.T.; 

RT "Delta, a Drosophila neurogenic gene, is transcriptionally complex 

RT and encodes a protein related to blood coagulation factors and 

RT epidermal growth factor of vertebrates."; 

RL GENES DEV. 2:1723-1735(1988). . 

RN [2] 

RP SEQUENCE OF 422-621 FROM N.A, 

RX MEDLINE; 87218537, 

RA KNUST E., DIETRICH U,, TEPASS U., BREMER K.A., WEIGEL D., VAESSIN H., 

RA CAMPOS-ORTEGA J. A.; 

RT "EGF homologous sequences encoded in the genome of Drosophila 

RT melanogaster, and their relation to neurogenic genes."; 

RL EMBO J. 6:761-766(1987). 

RN [3] 

RP PATTERN OF TRANSCRIPTION. 

RX MEDLINE; 91209246. 

RA HAENLIN M. , KRAMATSCHEK B., CAMPOS -ORTEGA J. A.; 

RT "The pattern of transcription of the neurogenic gene Delta of 

RT Drosophila melanogaster . " ; 

RL DEVELOPMENT 110:905-914(1990). 

CC -!- FUNCTION: ESSENTIAL FOR PROPER DIFFERENTIATION OF ECTODERM. 

CC DL IS REQUIRED FOR THE CORRECT SEPARATION OF NEURAL AND EPIDERMAL 

CC CELL LINEAGES. 

CC -!- SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 

CC OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN 

CC INSECTS, THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 

CC - ! - NOTCH AND SERRATE MAY INTERACT AT THE PROTEIN LEVEL, IT IS 

CC CONCEIVABLE THAT THE SERRATE AND DELTA PROTEINS MAY COMPETE FOR 

CC BINDING WITH THE NOTCH PROTEIN. 

CC -!- SIMILARITY: THE PROTEIN INCLUDES 9 EGF-LIKE REPEATS. 

CC -!- SIMILARITY: TO DROSOPHILA SERRATE PROTEIN 

CC (AC P18168), AND VERTEBRATE BLOOD COAGULATION FACTOR IX. 

DR EMBL; Y00222; G577774; -. 

DR FLYBASE; FBgn0000463; Dl. 

DR PFAM; PF00008; EGF; 8. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; TRANSMEMBRANE; EGF-LIKE DOMAIN; 

KW GLYCOPROTEIN; SIGNAL. 



FT 


SIGNAL 


1 


25 


POTENTIAL. 


FT 


CHAIN 


26 


832 


DELTA PROTEIN. 


FT 


DOMAIN 


26 


595 


EXTRACELLULAR (POTENTIAL) 


FT 


DOMAIN 


217 


566 


9 EGF-TYPE REPEATS. 


FT 


TRANSMEM 


569 


617 


POTENTIAL. 


FT 


DOMAIN 


618 


832 


INTRACELLULAR (POTENTIAL) 


FT 


REPEAT 


217 


256 


EGF-LIKE 1. 


FT 


REPEAT 


257 


290 


EGF-LIKE 2. 


FT 


REPEAT 


291 


330 


EGF-LIKE 3. 


FT 


REPEAT 


331 


373 


EGF-LIKE 4. 


FT 


REPEAT 


374 


417 


EGF-LIKE- 5. 


FT 


REPEAT 


418 


452 


EGF-LIKE 6. 


FT 


REPEAT 


453 


490 


EGF-LIKE 7. 


FT 


REPEAT 


491 


528 


EGF-LIKE 8. 


FT 


REPEAT 


529 


566 


EGF-LIKE 9. 


FT 


CARBOHYD 


98 


98 


POTENTIAL. 


FT 


CARBOHYD 


137 


137 


POTENTIAL. 
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FT 


CARBOHYD 


167 


167 


POTENTIAL. 


FT 


CARBOHYD 


421 


421 


POTENTIAL. 


FT 


CARBOHYD 


649 


649 


POTENTIAL. 


FT 


CONFLICT 


437 


438 


GK -> ET (IN REF. 2). 


FT 


CONFLICT 


443 


443 


A -> S (IN REF, 2). 


FT 


CONFLICT 


459 


459 


G -> A (IN REF. 2). 


FT 


CONFLICT 


490 


490 


S ■> T (IN REF. 2). 


FT 


CONFLICT 


591 


591 


T •> A (IN REF. 2). 


SQ 


SEQUENCE 


832 AA; 


88813 MW 


; CF9ABEC1 CRC32; 



Query Match 9.6*; Score 521; DB 5; Length 832; 

Best Local Similarity 37.1%; Pred. No. 1.80e-82; 

Matches 95; Conservative 55; Mismatches 77; Indels 29; Gaps 22; 

Db 295 CTNHRPCKNGGTCFNTGEGLYTCKCAPGYSGDDCENEIYSCDADVNPCQNGGTCIDEPHT 354 

I I MM : I I: 1 1 1 : 1 : 1 1 : I Ml: :| : :|| I ;|| ::: 
Qy 183 CLN-SPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYG-SPCLNNATC-KVAQA 238 

Db 355 KTGYKCHCANGWSGKMCEEKVLTCSDKPCHQG • ICRN-VR- - PG- LGS -KG -Q- -GYQCE 405 
"I I :| I II :: I : I I I : II : I : : I :|:|: 

•'239 GR-FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCD 297 
406 CPIGYSGPNCDLQLDNCSP--NPCINGGSCQP-SGK-CICPAGFSGTRCETNIDDCLGH 460 

lh I I :h h h III I I I I :| |:|::||:| MINIM 
Qy 298 CPMEYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNV 357 

Db 461 QCENGGTCIDMVNQYRCQCVPGFHGTHC--SSKVDL-CL-IRPC--AN-G-GTCL-NLNN 511 

:|:MI:|: : M I l|: I I ::::|: :| : I I |: : |: 
Qy 358 ECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNS 417 

Db 512 -DYQCTCRAGFTGKDC 526 

I: I I: 1:1 I 
Qy 418 SDFTCKCHEGFSGPSC 433 



Search completed; Fri May 28 09:15:05 1999 
Job time : 116 sees . 
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*************************** ****** ft*********************** 

Release 3.1A John F, Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

MPsrchjpp protein • protein database search, using Smith-Waterman algorithm 

Run on: Pri May 28 09:13:09 1999; MasPar time 43.77 Seconds 

♦916,462 Million cell updates/sec 
r output not generated. 

Title: MJS-09-19H47-9 

Description: (1-735)' from US09191647 .pep 

Perfect Score: 5438 

Sequence: 1 SNKNLTSFPSRIPFDTTELY TVHIIRQCQCEPTKSVLSEK 735 

Scoring table: 



PAM 150 
Gap 11 

Searched: 179066 seqs, 54579741 residues 

Post-processing: Minimum Match 0» 

Listing first 45 summaries 

Database: ,sptrerabl9 

i:8p_archea 2:sp_bacteria 3:sp_fungi 4:sp_human 
5:sp_invertebrate 6 : spjnammal 7 : spjnhc 8:sp_organelle 
9:spj?hage 10:sp_plant ll:sp_rodent 12:sp_unclassified 
13 :sp_vertebrate 14 : sp_virus 

Statistics: Mean 49.268; Variance 89.604; scale 0.550 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



SUMMARIES 



No, 


Score Match Length 


DB 


ID 


Description 


Pred. No. 


1 


4486 


82.5 


601 


5 


Q20204 


F40E1C4 PROTEIN (FRAG 


0.00e+00 


2 


1651 


30.4 


1531 


11 


088279 


MEGF4 . 


0.00e+00 


3 


1551 


28.5 


1523 


11 


088280 


MEGF5. 


0.00e+00 


4 


1424 


26.2 


739 


4 


075094 


MEGF5 (FRAGMENT), 


9.79e-287 


5 


1188 


21.8 


530 


5 


024526 


SLIT LOCUS ENCODING A 


1.98e-232 


6 


578 


10.6 


2653 


5 


025253 


NOTCH HOMOLOG SCALLOPE 


6.53e-95 


7 


570 


10,5 


2447 


13 


013149 


NOTCH 2 (FRAGMENT). 


3.70e-93 


8 


565 


10.4 


529 


5 


Q25058 


FIBROPELLIN IA (FRAGME 


4.606-92 


9 


559 


10.3 


406 


5 


025059 


FIBROPELLIN III (FRAGM 


9.44e-91 


10 


540 


9.9 


1203 


11 


006008 


NOTCH PROTEIN' HOMOLOG 


1.32e-86 


11 


540 


9.9 


2470 


11 


035516 


CELL SURFACE PROTEIN. 


1.32e-86 


12 


533 


9.8 


728 


13 


Q90656 


TRANSMEMBRANE PROTEIN 


4.42e-85 


13 


528 


9.7 


721 


13 


091902 


X-DELTA-1. 


5.41e-84 


14 


524 


9.6 


717 


13 


P87357 


DELTAD TRANSMEMBRANE P 


4,01e-83 


15 


521 


9.6 


832 


5 


099108 


NEUROGENIC LOCUS DELTA 


1, 80e-82 


16 


522 


9.6 


. 1476 


13 


090285 


PUTATIVE EXTRACELLULAR 


1.09e-82 


17 


516 


9,5 


752 


13 


042374 


NOTCH RECEPTOR PROTEIN 


2.18e-81 


18 


519 


9.5 


2531 


5 


016004 


NOTCH HOMOLOG , 


4.88e-82 


" 19 


499 


9,2 


642 


13 


P79941 


NOTCH LIGAND X-DELTA-2 


1.05e-77 


20 


500 


9.2 


802 


13 


057462 


DELTAA, 


6,38e-78 



21 


495 


9.1 


2352 5 


061240 


HRNOTCH PROTEIN. 


7.67e-77 


22 


492 


9.0 


615 13 


057409 


DELTAB. 


3.41e-76 


23 


489 


9.0 


1218 4 


015816 


TRANSMEMBRANE PROTEIN ' 


1.51e-75 


24 


489 


9.0 


1218 4 


014902 


TRANSMEMBRANE PROTEIN 


1.51e-75 


25 


487, 


9.0 


1218 4 


015122 


JAGGED1. 


4,09e-75 


26 


489 


9.0 


1227 4 


P78504 


JAGGED 1 (TRANSMEMBRAN 


1.51e-75 


27 


'477' 


8.8 


263 4 


099734 


NOTCH2 TRANSMEMBRANE P 


5,83e-73 


28 


480 


8.8 


1372 5 


P91526 


SIMILARITY TO MULTIPLE 


1.32e-73 


29 


471 


8.7 


1193 13 


090819 


C-SERATE-1 PROTEIN (FR 


l,14e-71 


30 


474 


8.7 


1219 11 


063722 


JAGGED PROTEIN, 


2,57e-72 


31 


461 


8.5 


1212 13 


042347 


C" SERRATE -2 (FRAGMENT) 


1.59e-69 


32 


463 


6.5 


1964 U 


035442 


NOTCH4, 


5.92e-70 


33 


444 


8.2 


1202 11 


P97607 


JAGGED2 (FRAGMENT). 


6,87e-66 


34 


445 


8.2 


1687 11 


Q61204 


NOTCH2-LIKE (EGF REPEA 


4.20e-66 


35 


447 


8.2 


1722 5 


019350 


SIMILAR TO EGF-LIKE RE 


1.576-66 


36 


426 


7.8 


434 11 


055139 


JAGGED2 PROTEIN (FRAGM 


4,66e-62 


37 


422 


7.8 


1999 4 


099940 


NOTCH4 ,' 


3,29e-61 


38 


422 


7.8 


2003 4 


000306 


N0TCH4, 


3.29e-61 


39 


417 


7.7 


955 4 


099466 


NOTCH4 (FRAGMENT) . 


3.77e-60 


40 


408 


7.5 


518 11 


070219 


JAGGED 2 (JAGGED 2 PRO 


3.02e-58 


41 


404 


7.4 


383 11 


070534 


ZOG. 


2,lle-57 


42 


404 


7.4 


383 11 


062779 


PREADIPOCKTE FACTOR 1. 


2.11e-57 


43 


384 


7.1 


387 11 


006007 


NOTCH PROTEIN HOMOLOG 


3.37e-53 


44 


378 


7.0 


156 5 


Q26661 


EPIDERMAL GROWTH FACTO 


6,06e-52 


45 


380 


7.0 


473 5 


025464 


ADHESIVE PLAQUE MATRIX 


2.31e-52 



ALIGNMENTS 

RESULT 1 

ID Q20204 PRELIMINARY; PRT; 601 AA, 

AC 020204; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE F40E10.4 PROTEIN (FRAGMENT) , 

GN F40E10.4. 

OS CAENORHABDITIS ELEGANS , 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA SMYE R,; 

RL SUBMITTED (FEB-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R,, ANDERSON K., BAYNES C, BERKS M,, 

RA BONFIELD J., BURTON J,, CONNELL M.,. COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M., DEAR S., DU Z . , DURBIN R., FAVELLO A., FULTON L., 

RA GARDNER A,, GREEN P., HAWKINS T., HILLIER L., JIER M., JOHNSTON L,, 

RA JONES M., KERSHAW J., KIRSTEN J., LAISTER N,, LATREILLE P., ■ 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M, , 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A,, SAUNDERS D., SHOWNKEEN R., 

RA SMALDON N., SMITH A., SONNHAMMER E., STADEN R., SULSTON J., 

RA THIERRY-MIEG J., THOMAS K., VAUDIN M, , VAUGHAN K., WATERSTON R,, 

RA WATSON A., WEINSTOCK L., WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C, 

RT elegans."; 

RL NATURE 368:32-38(1994). 

DR EMBL; Z69792; E1346469; -, 

DR PROSITE; PS01187; EGF.CA; 1, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 1 1 

SO SEQUENCE 601 AA; 66669 MW; F8A72773 CRC32; 

Query Match 82 . 5% ; Score 4486; DB 5; Length 601; 

Best Local Similarity 100,01; Pred. No, 0.Q0e+00; 

Matches 601; Conservative 0; Mismatches 0; Indels 0; Gaps C 

Db 1 IKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTKLATKCDLCLNSPCKNNAIC 60 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIII 
Qy 135 IKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTKLATKCDLCLNSPCKNNAIC 194 



Db 61 ETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCYCNKGFEGDYC 120 

n 1Q liii^llllllllllNIIMIIIMIIIIIIIIIIIIIIIIIIIMIIlllliiiii 

Oy 195 ETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCYCNKGFEGDYC 254 
Db 121 EKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEGKHCEDKLEYC 180 

r ,„ IKIIIIIIIIIMIIIIllMllMlllllllllllllllllliiiiiiiiiiiiiiii 
Qy 255 EKNIDDCVNSKCENGGKCyDLVRFCSEELKNFQSFQINSYRCMPMEYEGKHCEDKLEYC 314 

Db 181 TKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSYDC 240 

, 1P 1 1 M 1 1 1 1 M I M 1 1 1 1 1 1 [ j 1 1 1 1 1 1 1,| [ 1 1 1 1 M M M | , | , | , f M | , M 

Qy 315 TRKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSYDC 374 

Db 241 inPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTCKCHEGFSGPSCD 300 

) 434 



Qy 375 



DbA 301 RQMSVGFKNPGAYLALDPLASDGTITMTLRTTSKIGILLYYGDDHFVSAELYDGRVKLVY 360 

ML iiiiiiiiiniiiiMiiiiiiiiiii iiiimmiiii Milium 

°*B 35 RQMSVGFKNPGAYLALDPLASDGTITMTLRTTSKIGILLYYGDDHFVSAELYDGRVKLVy 494 

l[ 361 YIGNFPASHMYSSVKVNDGLPHRISIRTSERKCFLQIDKNPVQIVEHSGKSDQLIIKGKE 420 

, ' , r , c INIIIIIIIIIIIIIIIIIIMIIIllMMMIIIIIIIMIIIIIIIIIMIMIII 

Qy 495 YIGNFPASHMYSSVKVNDGLPHRISIRTSERKCFLQIDKNPVQIVENSGKSDQLITKGKE 554 

Db 421 MLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINEVPINLQQALENVNTEQSCSATV 480 

, „ e Hllllllllllllllllllllllllllllllllllllllliliiiiiiiiiiiiinii 

Qy 555 MLYIGGLPIEKSQDARRRFHVKNSESLKGCISSITINEVPINLQQALENVNTEQSCSATV 614 

Db 481 NFCAGIDCGNGKCTNNALSPKGYMCQCDSHFSGEHCDEKRIKCDKQKFRRHHIENECRSV 540 

rip IIIIIIIIIIIIMIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

Qy 615 NFCAGIDCGNGKCTNKALSPKGYMCQCDSHFSGEHCDEKRIKCDKQKFRRHHIENECRSV 674 

Db 541 DRIKIAECNGYCGGEQNCCTAVKKKQRKVKMICKNGTTKISTVHIIRQCQCEPTKSVLSE 600 

A „ li|lllllllllllllllllllltllllllllll|||||||||||||||||||||||||| 

Qy 675 DRIKIAECNGYCGGEQNCCTAVKKKQRKVKMICKNGTTKISTVHIIRQCQCEPTKSVLSE 734 

Db 601 K 601 
I 

Qy 735 K 735 



RESULT 2 

ID 088279 PRELIMINARY; PRT; 1531 AA, 
AC 088279; 

501-NOV-1998 (TREMBLREL. 08, CREATED) 

tHOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

jgOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

MEGF4 ! 

U • RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA O.; 

RT "Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif -trap screening 

RL GENOMICS 51:27-34(1998). 

DR. EMBL; AB011530; D1033423; -, 

DR PROSITE; PS01185; CTCK 1; 1. 

DR PROSITE; PS0U86; EGF 2; 8. 

DR PROSITE; PS01187; EGf'cA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

Query Match 30.4%; Score 1651; DB 11; Length 1531; 

Best 'Local Similarity 39.44; Pred. No. 0.00e+00; 

Matches 259; Conservative 162; Mismatches 199; Indels 37; Gaps 25; 

^48* SNKHLQALPKGIPKNVTELYLDGNQFTLVPGQ-LSTFKY-LQLVDLSNNKISSLSNSSFT 805 
\ 



r 



Qy 



, 111 = 1 = = l II : lllllhl : I: : I I :|||:|:: II |::|: 
1 SNKNLTSFPSRIPFDTTELYLDANYINEIPAHDLNRL-YSLTKLDLSHNRLISLENNTFS 59 



Db 806 NMSQLTTLILSYNALQCIPPLAFQGLRSLRLLSLHGNDVSTLQEGIFADVTSLSHLAIGA 865 

cn Mil II Hhllllllhl I :: |:::| |::|:|:| : 

Qy 60 NLTRLSTLIISYNKLRCLQPLAFNGLNALRILSLHGNDISFLPQSAFSNLTSITHIAVGS 119 

Db 866 NPLYCDCHLRWLSSWVKTGYKEPGIARCAGPPEMEGKLLLTTPAKKFECQGPPSLAVQAK 925 

n ,„ n 1 = 11111- 1 = 1 1 = 1: : I: : ||||: : | | :; . ■ : | 

Qy 120 NSLYCDCNMAWFSKWIKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTKLATK 179 

Db 926 CDPCLSSPCQNQGTCHNDPLEVYRCTCPSGYKGRNCEVSLDSCSSNPCGNGGTCHAQEGE 985 

n 10n II H : lll 1 = 1 = I I I :h I :|| :|:| :;|| hn :: 

Qy 180 CDLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQA- 238 

Db 986 DAGFTCSCPSGFEGLTCGMNTDDCVKHDCVNGG VC— V-D--G-IGNYTCQ 1030 

A „ fl I I =1 UN I : : : | :| |: 

Qy 239 GR-FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCD 297 

Db 1031 CPLQYTGRACEQLVDFCSPDLNPCQHEAQCVGTPEGPRCECVPGYTGDNCSKNQDDCKDH 1090 

H = = l 1= H= -I: lllh-: h : I I l|:||:|| I |ll|; 

Qy 298 CPMEYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNV 357 

Db 1091 QCQNGAQCVDEINSYACLCAEGYSGQLCEIPPAPRNSCEGTE-CQ-N-G-ANCVD-QGS 1144 

' =HII: III I II III ||:|| INN ; |: || : | :; || | | 

Qy 358 ECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNS 417 

Db 1145 RP-VCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWPRANITLQVSTAEDNGILLYN 1203 

A J10 1 1 H s 1 1 1 = = HI I : :|| : I : : ||: : |: 

Qy 418 SDFTCKCHEGFSGPSCDRQMSVGFKNPGAYIALDPLAS-D-GTITMTLRTTSKIGILLYY 475 

Db 1204 GDNDHIAVELYQGHVRVSYDPGSYPSSAIYSAETINDGQFHTVELVTFDQMVNLSIDGGS 1263 

A tnc H= == 111:1: ::|:| :||: :||| | : : | ::: | || : 

Qy 476 GDDHFVSAELYDGRVKLVYYIGNFPASHMYSSVKVNDGLPHRISIRTSERKCFLQIDKNP 535 

Db 1264 PMTMDNFGKHYTLNSEA-P-LYVGGMPVDVNSAAFRLWQILNGTSFHGCIRNLYINN— 1318 

n „ e = = l II I : : ||:||:|:: : | | :: | : | : ||| !: || ; 

Qy 536 VQIVENSGKSDQLITKGKEMLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINEVPI 595 

Db 1319 ELQD-FTKTQMKPGWPGCEPCRKLYCLHGICQPNA-TP-GPVCHCEAGWGGLHCDQ 1372 

Qy 596 NWQALENVNTEQSCSATVNFCAGIDCGNGKCTNNALSPKGYMCQCDSHFSGEHCDE 652 



RESULT 3 

ID 088280 PRELIMINARY; PRT; 1523 AA. 
AC 088280; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF5 , 

GN MEGF5. 

OS ' RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA- 
OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 
RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif -trap screening. 1 ; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011531; D1033424; •. 

DR PROSITE; PS01185; CTCK 1; 1. 

DR PROSITE; PS01186; EGF 2; 7. 

DR PROSITE; PS0U87; EGFJA; 2. 

KW GLYCOPROTEIN; EGF-LIKl DOMAIN. 

SQ SEQUENCE 1523 AA; 167767 MW; 2BD845D0 CRC32; 

Query Match 28.54; Score 1551; DB 11; Length 1523; 

Best Local Similarity 36.7%; Pred. No. 0.00e+00; 

Matches 241; Conservative 173; Mismatches 209; Indels 34; Gaps 17; 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

0rch_pp protein - protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 08:22:40 1999; MasPar time 53.13 Seconds 

610.419 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 



MJS-09-191-647-2 

(1-1525) from US09191647 . pep 

11299 

1 MRGVGWQMLSLSLGLVLAIL SSFVDEVEKWKCGCTRCVS 1525 

PAM 150 
Gap 11 



Searched; 170751 seqs, 21266608 residues 

Post-processing: Minimum Match 0% 

Listing first 45 summaries 



a-geneseq35 

1: parti 2:part2 3:part3 4:part4 5: parts 6:part6 7:part7 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13:partl3 



Itistics: 



14:partl4 15:partl5 16 
19:partl9 20:part20 21 
24:part24 25:part25 26 
29:part29 30:part30 31 
34:part34 35:part35 36: 
39:part39 



Mean 40.463; Variance 233.972; scale 0.173 



:partl6 17:partl7 18 

:part21 22 :part22 23 

:part26 27 :part27 28 

:part31 32:part32 33 

:part36 37 :part37 38 



:partl8 
;part23 
:part28 
:part33 
:part38 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Score 


Match Length 


DB 


ID 


Description 


Pred, 


No, 


7810 


69.1 


1534 


30 


W46966 


Amino acid sequence o 


0.00e+00 


4159 


36,8 


1480 


5 


R25079 


Drosophila SLIT prote 


1.40e 


293 


1039 


9.2 


228 


30 


W46967 


Amino acid sequence o 


7.55e 


63 


838 


7.4 


183 


30 


W46968 


Amino acid sequence o 


2.62e 


48 


740 


6.5 


1872 


36 


W68510 


Partial human Notch-3 


2.86e 


41 


734 


6.5 


2321 


36 


W49698 


Human Notch3 protein. 


7.69e 


41 


666 


5.9 


727 


21 


W11719 


C-Delta-1 polypeptide 


5.53e 


36 


666 


5.9 


740 


21 


W00876 


C-Delta-1 polypeptide 


5.53e 


36 


650 


5,8 


1036 


25 


W18351 


Proliferation and dif 


7.61e 


35 


650 


5.8 


1187 


25 


W18352 


Proliferation and dif 


7.61e 


35 


655 


5.8 


1218 


25 


W18354 


Proliferation and dif 


3,35e 


35 


650 


5.8 


1218 


19 


W05833 


Human Serrate -1 (HJ1) 


7.61e 


35 


650 


5.8 


1218 


29 


W44301 


Human serrate 1. 


7.61e 


35 


644 


5,7 


520 


25 


W18348 


Proliferation and dif 


2.03e 


34 


644 


5.7 


702 


25 


W13349 


Proliferation and dif 


2.03e 


34 


644 


5,7 


723 


25 


W18353 


Proliferation and dif 


2,03e 


34 



647 5 


.7 1055 29 


W44298 


Human serrate 2 prote 


L.24e 


34 


648 5 


.7 1193 19 


W05835 


Chick Serrate. 


L.06e 


34 


649 5 


,7 1208 28 


W40827 


Human Jagged protein. 


8.96e 


35 


647 5 


,7 1212 29 


W44299 


Human serrate 2. 


L,24e 


34 


647 5 


,7 1257 19 


W05834 


Human Serrate -2 (HJ2) 


L.24e 


34 


627 5 


.5 722 21 


W11720 


M-Delta-1 polypeptide 


3.28e 


33 


608 5 


.4 612 28 


W39256 


Human partial mature 


7.30e 


32 


608 5 


.4 737 28 


W39257 


Human membrane protei 


7.30e 


32 


581 5 


.1 685 37 


W80813 


Nucleotide sequence o 


5.95e 


30 


554 4 


.9 833 6 


R28960 


Delta Dll. 


4.78e 


28 


507 4 


.5 473 17 


R86869 


Adhesive protein, 


9,57e 


25 


494 4 


.4 560 12 


R71294 


Human glycoprotein V. 


7.77e 


24 


492 4 


.4 660 21 


W11725 


H-Delta-1 polypeptide 


1 . 07e 


23 


486 4 


.3 1091 27 


W41641 


Sequence used in dete 


2.82e 


23 


464 4 


.1 1404 7 


R38304 


Sequence of a serrate 


9,61e 


22 


432 3 


.8 157 21 


W11730 


H-Delta-1 polypeptide 


1.59e 


19 


428 3 


.8 383 10 


R56166 


Neuroendocrine tumor 


3,01e 


19 


405 3 


.6 385 10 


R56167 


Neuroendocrine tumor 


1.16e 


17 


401 3 


,5 196 5 


R29102 


Drosophila SLIT prote 


2,i9e 


17 


394 3 


.5 345 23 


W09405 


Pineal gland specific 


6!61e 


17 


390 3 


.5 353 1 


R05160 


Sequence of human bon 


1.24e 


16 


371 3 


.3 2189 1 


R05222 


Antigen GX5401FL enco 


2.48e 


15 


364 3 


.2 186 8 


R42264 


Decor in sequence PT-7 


7.42e 


15 


366 3 


.2 234 8 


R42265 


Decorin sequence PT-7 


5.43e 


15 


366 3 


.2 280 8 


R42266 


Decor in sequence PT-7 


5.43e 


15 


366 3 


.2 305 8 


R42267 


Decorin sequence PT-7 


5,43e 


15 


366 3 


.2 331 8 


R42260 


Mature decorin PT-65. 


5,43e 


15 


366 3 


.2 342 17 


R89439 


Human recombinant dec 


5,43e 


15 


366 3 


.2 1388 18 


R89471 


Collagen/decorin fusi 


5,43e 


15 



ALIGNMENTS 

RESULT 1 

ID W46966 standard; Protein; 1534 AA. 

AC W46966; 

DT 06-JUL-1998 (first entry) 

DE Amino acid sequence of a human slit-like polypeptide. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1..26 

FT /note- "signal peptide" 

FT Protein 27,. 1534 

FT /note- "mature protein" 

PN J10087699-A, 

PD 07-APR-1998, 

PF 15-JUL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

DR DPI; 98-267127/24. 

DR N-PSDB; V16978. 

PT Human Slit-like protein • useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 31-35; 45pp; Japanese. 

CC The present sequence represents a novel human slit-like protein (the 

CC mature protein is claimed in claim 1). The slit-like polypeptide is 

CC useful for diagnosis and treatment of brain-specific diseases and 

CC cancers. Antibodies directed against the protein, or its fragments 

CC can also be used for diagnosing cancer. 
1534 AA; 



Db 



Query Match 69.1%; Score 7810; DB 30; Length 1534; 

Best Local Similarity 65.4*; Pred, No. 0.00e+00; 

Matches 986; Conservative 281; Mismatches 233; mdels 8; Gaps 5; 

28 rlgasacpalctctgttvdchgtglqaipkniprnterlelngnnitrihkndfaglkql 87 

UN hhNIIIII :I:::|:|IIIIIIII:||IHIIII I 

Qy 22 KVAPQACPAQCSCSGSTVDCHGLALRSVPRNIPRNTERLDLNGNNITRITKTDFAGLRHL 81 

Db 88 rvlqlmenqigavergafddmkelerlrlnrnqlhmlpellfqnnqalsrldlsenaiqa 147 
NIMH |;::|||||:|:|||||||||||:|:::|||M I lllllll III 
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■647-2. rag 



Page 2 



Qy 82 RVLQLMENKISTIERGAFQDLKELERLRLNRNHLQLFPELLFLGTAKLYRLDLSENQIQA 141 

Db 148 iprkafrgatdlknlrldknqiscieegafralrglevltlnnnnittipvssfnhmpkl 207 

lllllllll !:IH:I! llllllllllll 
Qy 142 IPRKAFRGAVDIKNLQLDYNQ1SCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKL 201 

Db 208 rtfrlhsnhlfcdchlawlsqwlrqrptiglftqcsgpaslrglnvaevqksefscsgqg 267 

II :ll:lll l|: III lllllll II II : 
Qy 202 RTFRLHSNNLYCDCHLAWLSDWLRKRPRVGLYTQCMGPSHLRGHNVAEVQKREFVCSDEE 261 

Db 268 eagrvptctlssg-scpamctcsngivdcrgkgltaipanlpetmteirlelngiksipp 326 

i: : i iii inn ininiiii nnnnniini i n in 

Qy 262 EGHQSFMAPSCSVLHCPAACTCSNNIVDCRGKGLTEIPTNLPETITEIRLEQNTIKVIPP 321 

Db 327 gafspyrklrridlsnnqiaeiapdafqglrslnslvlygnkitdlprgvfgglytlqll 386 

linil:|lllllllllll:|:|inilllllllllllliniini:::| l|::|||| 
Qy 322 GAFSPYKKLRRIDLSNNQISELAPDAFQGLRSLNSLVLYGNKITELPKSLFEGLFSLQLL 381 

Db 387 llnankincirpdafqdlqnlsllslydnkiqslakgtftslraiqtlhlaqnpficdcn 446 

llllllinn inill:||:|lllllll:|::lini::llllll:llllllllll|: 
Ak. 382 LLNANKINCLRVDAFQDLHNLNLLSLIDNKLQTIAKGTFSPLRAIQTMHLAQNPFICDCH 441 



447 lkwladflrtnpietsgarcasprrlankrigqikskkfrcsakeqyfipgtedyqlnse 506 

iinii:i:nininni:ininniniininni: in : |::: 

442 LKWLADYLHTNPIETSGARCTSPRRLANKRIGQIKSKKFRCSGTEDY- -R-S - - ■ KLSGD 495 

507 cnsdvvcphkcrceanvvecsslkltkiperipqstaelrlnnneisileatgmfkklth 566 

I :h II lllll: |:||: II 1111:111 I II 1 1 II 1 1 1: :: II I II :| 1 1 1 : 
496 CFADLACPBKCRCEGTTVDCSNQKLNKIPEHIPQYTAELRLNNNEFTVLEATGIFKKLPQ 555 

567 lkkinlsnnkvseiedgafegaasvselhltanqlesirsgmfrgldglrtlmlrnnris 626 

I = 1 1 1 : 1 1 1 1 : : : ! 1 : 1 1 1 1 1 1 : : E : t : ll:|:||::: 1 1 : 1 1 : : I : ! 1 1 1 1 : 1 II : 
556 LRRINFSNNKITDIEEGAFEGASGVNEILLTSNRLENVQHKMFKGLESLKTLMLRSNRIT 615 

627 cihndsftglrnvrllslydnqittvspgafdtlqslstlnllanpfncncqlawlggwl 686 

i: mi n ninnnnnnniiiiiinniiiniiiiini nn i 

616 CVGNDSFIGLSSVRLLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLAWLGEWL 675 

687 rkrkivtgnprcqnpdflrqiplqdvafpdfrceegqeeggclprpqcpqecacldtvvr 746 

ll::|||||||l|:| ll::||:llll: II l::l :: :f I ::|| 11:1111111 
676 RKKRIVTGNPRCQKPYFLKEIPIQDVAIQDFTCDDGNDDNSCSPLSRCPTECTCLDTWR 735 

747 csnkhlralpkgipknvtelyldgnqftlvpgqlstfkylqlvdlsnnkisslsnssftn 806 

nn i: niiinniniiniiinii m n i minmmii nn 

736 CSNKGLKVLPKGIPRDVTELYLDGNQFTLVPKELSNYKHLTLIDLSNNRISTLSNQSFSN 795 

807 msqlttlilsynalqcipplafqglrslrllslhgndistlqegifadvtslshlaigan 866 

1:11 lllllll mill : 1 : 1 1 : 1 ! I II 1 1 1 1 M M : II I I ::: 1 1 1 1 1 1 1 1 1 
796 MTQLLTLILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISVVPEGAFNDLSALSHLAIGAN 855 

867 plycdchlrwlsswvktgykepgiarcagpqdmegklllttpakkfecqgpptlavqakc 926 

llllll:::lll III: 111111111111 :| 1111111:111 llll : : III 
856 PLYCDCNMQWLSDWVKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDVNILAKC 915 



i 



Db 



927 dlclsspcqnqgtchndplevyrcacpsgykgrdcevslnscssgpcenggtchaqeged 986 ' 
: 1 1 1 : 1 i minni:: 111:11 1 : 1 1 : 1 1 : 1 : " : I I II :||||| III: 

Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 987 apftcscptgfegptcgvntddcvdhacanggvcvdgvgnytcqcplqyegkaceqlvdl 1046 

I I I: llll I II III 1:11: lllh llll II :| I II: :|: 

Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 1047 cspdlnpcqheaqcvgtpdgprcecmpgyagdncsenqddcrdhrcqngaqcmdevnsys 1106 

I: lllllll:: |: II I :|:| III |::| : ll|:|::| II 1 : I 1:1: 

Qy 1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 

Db 1107 clcaegysgqlceipphlpapk-spcegtecqngancvdqgnrpvcqclpgfggpecekl 1165 

1 : 1 : 1 1 1 1 1 :'!:: : h III: mill I: : I 1:111111: I llll 

Qy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKL 1155 

Db 1166 lsvnfvdrdtylqftdlqnwpranitlqvstaedngillyngdndhiavelyqghvrvsy 1225 

:||l|:::::|l|: :|::|||||::| 11:11111:11:11111111:1:11 II 

Qy 1156 VSVNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRVRASY 1215 



Db 1226 dpgsypssaiysaetindgqfhtvelvafdqmvnlsidggspmtnidnfgkhytlnseapl 1285 

i ii mini nun n nmm mmum : nn: m :m 

Qy 1216 DTGSHPASAI YSVET INDGNFH IVELLALDQSLS LSVDGG NPKI IT NLSKQSTLNFDS PL 1275 

Db 1286 yvggmpvdvnsaafrlwqilngtgfhgcirnlyinnelqdftktqmkpgvvpgcepcrkl 1345 

lllin I :: ' I M : 1 1 1 1 1 1 M 1 1 1 : 1 1 1 1 1 I I I : : 1 1 M 1 1 : f 
Qy 1276 YVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPGCEPCHKK 1335 

Db 1346 yclhgicqpnatpgpmchceagwvglhcdqpadgpchghkcvhgqcvpldalsyscqcqd 1405 

I II III:: :| I |: Ihl III :: II hHIII |:|::|:|lll I 
Qy 1336 VCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFSYSCKCLE 1395 

Db 1406 gysgalcnqagalaepcrglqclhghcqasgtkgahcvcdpgfsgelceqesecrgdpvr 1465 

I :l II:: I :||::: I II h II : I I :|::|: |::| III: :| 
Qy 1396 GHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPYCECSSGYTGDSCDREISCRGERIR 1455 

Db 1466 dfhqvqrgyaicqttrplswvecrgscpgqgccqglrlkrrkftfecsdgtsfaeevekp 1525 

i: i inn urn mmmm n n iimmmmi nin 

Qy 1456 DYYQKQQGYAACQTTKKVSRLECRGGCAGGQCCGPLRSKRRKYSFECTDGSSFVDEVEKV 1515 



Db 



1526 tkcgcalc 1533 

llll: I 
1516 VKCGCTRC 1523 



RESULT 2 

ID R25079 standard; Protein; 1480 AA. 
AC R25079; 

DT 05-JAN-1993 (first entry) 

DE' DrosopMla SLIT protein involved in axon pathway development. 
KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 
KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 
KW midline glial cells; axonogenesis; cell-cell interaction; ss. 
Drosophila melanogaster, 



Key 

peptide 



domain 



domain 



domain 



FT region 



Location/Qualifiers 
1. .36 

/label- signal 
73. .294 

/label- Flank_LRR_Flank_l 
/note- "mediates adhesive events" 
295. .518 

/label- Flank-LRR-Flank_2 
/note- "mediates adhesive events" 
519.. 714 

/label- Flank_LRR_Flank_3 
/note- "mediates adhesive events" 
715.. 910 

/label- Flank_LRR_FlankJ 
/note- "mediates adhesive events" 
911. .1150 

/label- TandemjGF_like_repeats 
/note- "involved in protein -protein interactions" 
1353.. 1393 

/label- 7th_EGF_like_repeat 
/note- "involved in receptor-ligand interactions" 
1394.. 1404 

/label- alternative_splice_segment 
/note- "developmentally regulated" 
1405.. 1480 

/label- C-terminal_region 
WQ9210518-A. 
25-JUN-1992. 
27-NQV-1991; 009055. 
07-DEC-1990; DS-624135. 
(UYYA ) UNIV YALE. 
Artavanis-Tsakonas S, Rothberg JM; 
WPI; 92-234590/28. 
N-PSDB; Q25811. 

SLIT protein and sequence elements for treating 
neurodegenerative disease - useful for Alzheimer's disease, 
nerve damage and Parkinson's disease, for diagnosis of cancer 
Claim 1; Page 84-89; 122pp ; English, 



region 



region 



region 
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CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways, The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding, SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes -caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

« degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 
region-Flank" domains, the C-terminal region and part of the 
alternative splice segment (i.e. GEGSTEPFTVT) are all individually 
claimed as are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 

Query Match 36.81; Score 4159; DB 5; Length 1480; 

Best Local Similarity 43.9%; Pred. No. 1.40e-293; 

Matches 592; Conservative 335; Mismatches 370; Indels 52; Gaps 33; 

Db 105 lelqgnnltviyetdfqrltklrmlqltdnqihtiernsfqdlvslerldisnnvittvg 164 

hi I::: I II I I |:| |::: : I I I II : I : I I ':: 
Qy 84 LQLMENKISTIERGAFQDLKELERLRLNRNHLQLFPELLFLGTAKLYRLDLSENQIQAIP 143 

Db 165 rrvfkgaqslrslqldnnqitcldehafkglveleiltvnnnnltslphnifggvgrlra 224 

I: hll -Mil 111:1::: ll::| : 1 1 : 1 1 : 1 1 1 1 : 1 |: I : :||: 
Qy 144 RKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKLRT 203 

Db 225 lrlsdnpfacdchlswlsrflrsatrlapytrcqspsqlkgqnvadlhdqefkcsglte- 283 

:|| I : 11111:111 II |:: Ml ; 1 1 : 1 : 1 : 1 1 1 : ; : :|| I 
Qy 204 FRLHSNNLYCDCHLAWLSDWLRKRPRVGLYTQCMGPSHLRGHNVAEVQKREFVCSDEEEG 263 

Db 284 hap-mecg-aenscphpcrcadgivdcreksltsvpvtlpddttdvrleqnfitelppks 341 

I : I : : II :| |:: Hill Ml :| II: |::|lill I :|| : 
Qy 264 HQSFMAPSCSVLHCPAACTCSNNIVDCRGKGLTEIPTNLPETITEIRLEQNTIKVIPPGA 323 

Db 342 fssfrrlrridlsnnnisriahdalsglkqlttlvlygnkikdlpsgvfkglgslrllll 401 
A IMMIIIIIIII II :l II: M I HIIIIIII :ll ::! II MM 
H 324 FSPYKKLRRIDLSNKQISELAPDAFQGLRSLNSLVLYGNKITELPRSLFEGLFSLQLLLL 383 

Db 402 naneiscirkdafrdlhslsllslydnniqsvangtfdamkstnktvhlaknpficdcnlr 461 

III hhl lll:|ll:hlllllll::|::|:|ll ::::: IMi IIIIIIIM 
Qy 384 NANKINCLRVDAFQDLHNLNLLSLYDNKLQTIAKGTFSPLRAIQTMHLAQNPFICDCHLK 443 

Db 462 wvadylhknpietsgarcespkrmhrrriesvreekfkcs-wgelrmklsgecrmdsdcp 520 

hlllll llllllllll Ml: :|| :: Mil : I MM I || 
Qy 444 WLADYLHT NP I ETSGARCT SP RRLANKR IGQ I KSKKFRCSGTEDYRSKLSGDC FADLAC P 503 

Db 521 amchcegttvdctgrrlkeiprdiplhttelllndnelgrissdglfgrlphlvklelkr 580 

Mllllllll: ::|: || || |:|| ||;||: ; ; : :||:| |::: 
Qy 504 EKCRCEGTTVDCSNQKLNKIPEHIPQYTAELRLNNNEFTVLEATGIFKKLPQLRKINFSN 563 

Db 581 nqltgiepnaf egashiqe- - 1 - - -ql -g-enki-k - -e i-snkm fl 616 

I M || HUH : I | : | :M: | | : |:: : 

Qy 564 NKITDIEEGAFEGASGVNEILLTSNRLENVQHKMFKGLESLKTLMLRSNRITCVGNDSFI 623 

Db 617 glhqlktlnlydnqiscvmpgsfehlnsltslnlasnpfncnchlawfaecvrkkslngg 676 

II :: I : I ! 1 1 1 1 : I Ml: IMMII MUM MM MM : I 

Qy 624 GLSSVRLLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLAWLGEWLRKKRIVTG 683 

Db 677 aarsgapskvrdlqikdlphsefkcssense-gclgdgycppsctctgtvvacsrnqlke 735 

:| I :::: I M: ;| I I : :| ; || I III II : II 
Qy 684 NPRCQKPYFLKEIPIQDVAIQDFTCDDGNDDNSCSPLSRCPTECTCLDTWRCSNKGLKV 743 

Db 736 iprgipaetsqvylesneieqihyerirhlrsltrldlsnnqitilsnytfanltklsrl 795 



:| : 1 1 1 = :::||::|:: : M : Ml : 1 1 1 1 M M III :|:M I I 
Qy 744 LPKGIPRDVTELYLDGNQFTLVPRE-LSNYKHLTLIDLSNNRISTLSNQSFSNMTQLLTL 802 

Db 796 iisynklqclqrhalsglnnlrvvslhgnrismlpeasfedlkslthialgsnplycdcg 855 

1 : 1 1 1 : 1 : 1 : ::: MMMIIII M:M:MI : I :| :| : I :| 1 1 1 1 1 1 
Qy 803 ILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISWPEGAFNDLSALSHLAIGANPLYCDCN 862 

Db 856 lkwfsdwikldyvepgiarcaepeqmkdklilstpsssfvcrgrvrndilakcnacfeqp 915 

: MIM M MUM I M 111:1:111 | |:| | :||M|:|: | 
Qy 863 MQWLSDWVKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDVNILAKCNPCLSNP 922 

Db 916 cqnqaqcvalpqreyqclcqpgyhgkhcefmidacygnpcrnnatctvle--egrfrcgc 973 

I h: I : I M I IM I: I II MM:: MM I M I 

Qy 923 CKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWCIC 982 

Db 974 apgytgarcetniddclgeikcqnnatcidgvesykcecqpgfsgefcdtkiqfcspefn 1033 

MM II Mil : MMIM:::| I I I MIM MM M 
Qy 983 ADGFEGENCEVNVDDC-EDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQDLN 1041 

Db 1034 pcangakcmdhfthyscdcqagfhgtnctdniddcqnhmcqnggtcvdgindyqcrcpdd 1093 

II : MM : ||| :|: | : : 1 1 1 1 : : : | ||: | |::| | | ||: 

Qy 1042 PCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICPEG 1101 

Db 1094 ytgkyceghnmismmypqtspcqnheckhgvcfqpnaqgsdylcrchpgytgkwceylts 1153 

hi M : M MUM :MI I : :: Ml III I II I I 
Qy 1102 YSGLFCE-FSP-PMVLPRTSPCDNFDCQNGA--QCIVRINEPICQCLPGYQGEKCEKLVS 1157 

Db 1154 isfvhnnsfveleplrtrpeanvtivfssaeqngilmydgqdahlavelfngrirvsydv 1213 

:M::M::: : : ||::|:|: ::: |:;|||:| |: |:||||: Ml III 
Qy 1158 VNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRVRASYDT 1217 

Db 1214 gnhpvstmysfemvadgkyqavellaikknftlrvdrglarsiinegsndylklttpmfl 1273 

Ml |:M I : II::: MM :::| II I :: I I ; h; :|::: 
Qy 1218 GSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSTLNFDSPLYV 1277 

Db 1274 gglpvdpaqqayknwqirnlts f kgcmkevwinhk lvdf gnaqrqqk i tpgcal - lege - 1331 

IM : : : M III IM::: MM: | | ||| 
Qy 1278 GGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPGCEPCHKKVC 1337 

Db 1332 qqee-eddeqd-fm-d--et--phikeepv-dpclenkcrrgsrcvpnsnardgyqckck 1383 

: : I I : I : :: Ml III M: M || :| 
Qy 1338 AHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGT-CLPI-NAF-SYSCKCL 1394 

Db 1384 hgqrgry cdqgegs tep ■ ptvt - aas tcr 1410 

I: I M I :l :: : II 
Qy 1395 EGHGGVLCDEEEDLFNPCQAIKCKHGKCR 1423 



RESULT 3 

ID W46967 standard; Protein; 228 AA. 

AC W46967; 

DT 06-JUL-1998 (first entry) 

de Amino acid sequence of the specification. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody. 

OS Mus sp. 

PN J10087699-A. 

PD 07-APR-1998. 

PF 15-JUL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-267127/24, 

DR N-PSDB; V16966. 

PT Human Slit-like protein - useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Page 35; 45pp; Japanese. 

CC The present sequence appears in the specification. The specification 

CC describes a novel human slit-like protein (the mature protein is claimed 

CC in Claim 1). The slit-like polypeptide is useful for diagnosis and 

CC treatment of brain-specific diseases and cancers. Antibodies directed 

CC against the protein, or its fragments can also be used for diagnosing 

CC cancer. 

SQ Sequence 228 AA; 
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Query Match 9.2%; 
Best Local Similarity 58.7*; 
Matches 132; Conservative 



Score 1039; DB 30; Length 228; 

Pred. No. 7.55e-63; 

43; Mismatches 50; Indels 0; 



Db 2 edngillyngdndhiavelyqghvrvsydpgsypssaiysaetindgqfhtvelvtfdqm 61 

||| || |:||||| MM || |||::;|| 
Qy 1188 EDSGILLYKGDKDHIAVELYRGRVRASYDTGSHPASAIYSVETINDGNFHIVELLALDQS 1247 

Db 62 vnlsidggspratmdnfgkhytlnseaplyvggmpvdvnsaafrlwqilngtsfhgcirnl 121 

::||:l!hl: : |::|: III ::||||llll I Ml llllllllllll 
Qy 1248 LSLSVDGGNPKIITNLSKQSTLNFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNL 1307 

Db 122 yinnelqdftktqmkpgvvpgcepcrklyclhgicqpnatpgpvchceagwgglhcdqpv 181 

111:11111 I I I II III:: :| I 1 : 1 1 I III 

Qy 1308 YINSELQDFQKVPMQTGILPGCEPCHKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRT 1367 

Db 182 dgpchghkcvhgkcvpldalayscqcqdgysgalcnqvgavaepc 226 
: II MINI hl::|::|: I :| :| |:: : :|| 

qy 1368 ndpclgnkcvhgtclpinafsysckcleghggvlcdeeedlfnpc 1412 
Hlt 

ID W46968 standard; Protein; 183 AA. 

AC W46968; 

DT 06-JUL-1998 (first entry) 

DE Amino acid sequence of the specification, 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody. 

OS Rattus sp. 

PN J10087699-A. 

PD 07 -APR- 1998. 

PF 15-JOL-1997; 205351. 

PR 16-JUL-1996; JP-186219 . 

PA (AS AH ) ASAHI KASEI KQGYO KK. 

DR WPI; 98-267127/24 . 

DR N-PSDB; V16967. 

PT Human Slit-like protein - useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 37-38; 45pp; Japanese. 

CC The present sequence appears in the specification. The specification 

CC describes a novel human slit-like protein (the mature protein is claimed 

CC in Claim 1). The slit-like polypeptide is useful for diagnosis and 

CC treatment of brain-specific diseases and cancers. Antibodies directed 

CC against the protein, or its fragments can also be used for diagnosing 

CC cancer. 

SQ Sequence 183 AA; 

•Ouery Match 7.4%; Score 838; DB 30; Length 183; 

st Local Similarity 59.0%; Pred. No. 2.62e-48; 
tches 108; Conservative 33; Mismatches 42; Indels 0; Gaps 0; 

Db 1 ghvrvsydpgsypssaiysaetindgqfhtvelvtfdqmvnlsidggspmtmdnfgkhyt 60 

N III || |:||lll llllll II I:::: ::||:|||:|: : |::|: 
Qy 1209 GRVRAS YDTGSHPASAI YSVET I NDGNFH IVELLALDQSLSLS VDGGNPKI ITNLSKQST 1268 

Db 61 lnseaplyvggmpvdvnsaafrlwqilngtsfhgcirnlyinnelqdftktqmkpgvvpg 120 

II Mlllllll I |::| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 I | |;:|| 
Qy 1269 LNFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPG 1328 

Db 121 cepcrklyclhgicqpnatpgpvchceagwgglhcdqpvdgpchghkcvhgkcvpldala 180 

MM:! I || III:: :l I |: II I III : II |:||||| |:|::|:: 
Qy 1329 CEPCHKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFS 1388 

Db 181 ysc 183 

III 

Qy 1389 YSC 1391 



RESULT 5 

ID W68510 standard; Protein; 1872 AA. 
AC W68510; 

DT 06-JAN-1999 (first entry) 



DE Partial human Notch- 3 protein. 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy. 



OS 


Homo sapiens. 




FH 


Key Location/Qualifiers 


FT 


Miscjifference 328 




FT 


/note- 


"encoded by NAN 


FT 


Miscjifference 401 


FT 


/note» 


"encoded by GNN 


FT 


Miscjifference 403 


FT 


/note- 


"encoded by GNC 


FT 


Miscjifference 406 


FT 


/note- 


"encoded by GNN 


FT 


Miscjifference 409 


FT 


/note- 


"encoded by NNT" 


FT 


Miscjifference 420 


FT 


/note- 


"encoded by GNC" 


FT 


Miscjifference 706 


FT 


/note- 


"encoded by NNN 


FT 


Miscjifference 708 




/note- 


"encoded by CCN 


FT 


Miscjifference 719 


FT 


/note- 


"encoded by CGN 


FT 


Miscjifference 728 


FT 


/note- 


"encoded by CNT 


FT 


Miscjifference 729 


FT 


/note- 


"encoded by GTN 
9 


FT 


Miscjifference 759.. 78 


FT 


/note- 


"encoded by nnn 


FT 


Miscjifference 1425 


FT 


/note- 


"encoded by GNA 


PN 


FR2751985-A1. 




PD 


06-FEB-1998. 




PF 


01-AUG-1996; 009733. 




PR 


01-AOG-1996; FR-009733 





PA 



(INRM ) INSERM INST NAT SANTE 4 RECH MEDICALE. 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133137/13, 

DR N-PSDB; V57163. 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig la-lg; 42pp; French. 

CC This sequence represents a partial human notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of the 

CC cerebral autosomal dominant arteriopathy with subcortical infarcts and 

CC leukoencephalopathy (CADASIL) type. Blocking expression of a mutated 

CC Notch3 gene or by substitution therapy with non-mutated NotchJ gene or 

CC protein can be used to treat CADASIL or related disorders. 

SQ Sequence 1872 AA; 

Query Match, 6.5%; Score 740; DB 36; Length 1872; 

Best Local Similarity 31.4%; Pred. No. 2,86e-41; 



I III : I I II II |: I |: I :| ::||::|||h I: 



I MM: MM ; | | MM: | | |||| M | : 



Matches 


Db 


94 


Qy 


916 


Db 


152 


Qy 


975 


Db 


211 


Qy 


1035 


Db 


271 



I I I I I II I MM MM I M 



I :N I II: I I: 
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Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPC-DNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 

Db 324 qdvdxcsiganpcehlgrc-vntqgsflcqcgrgy-tgpr-cetdvneclsgpcrnqa-t 379 

I I :: :: I I :: I : :| | : | : : 
Qy 1154 KLVSVNFINKESYI/QIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRVRA 1213 

Db 380 cldrigqftc-ic-magft-gtycxvxi-dxcqx-spcvnggvckxrvn-gfsct-c-ps 431 

:: : I : : I : I : | | hi I | : | | 
Qy 1214 S YDTGSH PASAI YSVET INDGNFH I VELLALDQSLSLS VDGGNPKI ITNLSKQSTLNFDS 1273 

Db 432 g-fsgstc-qldvdecastpcrngakcvd-qpdgyecrcaegfegtlcdrnv-ddcsp-d 486 

: I: :| :| :||: : I : |: : : | | 
Qy 1274 PLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPGCEPCH 1333 

Db 487 p--chhgrc-vdgiasfscacapgytgtrcesqvde-crsqpcrhggkcldlvd-kylcr 541 
I I! I : 1:1: I | I : : : : I : I I ! 1 1 : |: 

#1334 KKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 1392 
542 cpsgttgvncevnid- -d-casnpctf gvcr-dginr-ydcvcqpgf tgplcnveinec 595 
I I II I: : I ' I : I I II |: : I I I :|:|| |: ||: 
Qy 1393 CLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPY-CECSSGYTGDSCDREIS-C 1449 



RESULT 6 

ID W49698 standard; Protein; 2321 AA. 

AC W49698; 

DT 21-DEC-1998 (first entry) 

DE Human Notch3 protein, 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy; therapy. 

OS Homo sapiens, 

PN FR2751986-A1. 

PD 06-FEB-1998, 

PF 16-APR-1997; 004680. 

PR 01-AUG-1996; FR-009733. 

PA (INRM ) INSERM INST NAT SANTE fi RECH MEDICALE. 

PI Bach jf, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133138/13, 

DR N-PSDB; V57001. 

PT Human Notch3 nucleic acids ■ and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig 1.1-1.8; 45pp; French. 

A This sequence represents the human Notch3 protein, a transmembrane 

■| receptor protein involved in lateral inhibition and regulating 

developmental cascades of neurogenic genes, Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of 

CC the cerebral autosomal dominant arteriopathy with subcortical infarcts 

CC and leukoencephalopathy (CADASIL) type, Blocking expression of a 

CC mutated Notch3 gene or by substitution therapy with non -mutated Notch3 

CC gene or protein can be used to treat CADASIL or related disorders . 

SQ Sequence 2321 AA; 

Query Match 6.5*; Score 734; DB 36; Length 2321; 

Best Local Similarity 31.5*; Pred. No. 7.69e-41; 

Matches 170; Conservative 100; Mismatches 227; Indels 42; Gaps 36; 

Db 160 decrvgepcrhggtclntpgsf-rcqcpagytgplcenpavpcapspcrnggtcr-qsgd 217 

: I :::H:: III : I I II II |: I |: I :| ::||::||||: |: 
Qy 916 NPC-LSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKKGGTCHLKEGE 974 

Db 218 lt-ydcaclpgfegqncevnvddcpghrclnggtcvdgvntyncqcppewtgqfctedvd 276 

: I I lllhlllllllll : I I illllhl I I llll lh:| I :| 
Qy 975 EDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLD 1034 

Db 277 ecqlqpnachnggtcfntlgghscvcvngwtgescsqniddcatavcfhgatchdrvasf 336 

I : M:: ::||| I :|| I I I :: 

Qy 1035 FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 

Db 337 ycacpragktgllchlddacv- - -snpchedaicdtnp- - -vn-graictcppgf tggacd 389 

Ml I :||:| : : I ::|| :: |: : | :|| | ||: | |: 



Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPC-DNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 

Db 390 qdvdecsiganpcehlgrc-vntqgsflcqcgrgy-tgpr-cetdvneclsgpcrnqa-t 445 

I I :: :: I I :: I : :| I : I : : 
Qy 1154 KLVSVNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRVRA 1213 

Db 446 cldrigqftc-ic-magft-gtycevdi-decqs-spcvnggvckdrvn-gfsct-c-ps 497 

I :: : I : : I : |:: II I |:|| I I ; I I 
Qy 1214 SYDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSRQSTLNFDS 1273 

Db 498 g-fsgstc-qldvdecastpcrngakcvd-qpdgyecrcaegfegtlcdrnv-ddcsp-d 552 

: |: :| :| :||: : I : |: : : | | 
Qy 1274 PLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPGCEPCH 1333 

Db 553 p--chhgrc-vdgiasfscacapgytgtrcesqvde-crsqpcrhggkcldlvd-kylcr 607 

llll : 1:1:1 I I I I : : : : I : I 1 1 1 1 : I |: 
Qy 1334 KKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHG - TCLPINAFSYSCK 1392 

Db 608 cpsgttgvncevnid- -d-casnpctf gvcr-dginr-ydcvcqpgf tgplcnveinec 661 

I I II I: : I : I : I III |: : I II :|:|| h lh I 
Qy^ 1393 CLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPY-CECSSGYTGDSCDREIS-C 1449 



RESULT 7 

ID W11719 standard; Protein; 727 AA, 

AC W11719; 

DT 28-APR-1997 (first entry) 

DE C-Delta-1 polypeptide. 

KW C-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy. 

OS Gallus sp. 

FH Key Location/Qualifiers 

FT domain 184.. 228 

FT /label- DSL 

FT domain 229.. 261 

ft /label- EGF1 

FT domain 262.. 292 

FT /label- EGF2 

FT domain 293.. 332 

FT /label- EGF3 

FT domain 333,, 370 

FT /label- EGF4 

FT domain 371,. 409 

FT /label- EGF5 

FT domain 410.. 447 

ft /label- EGF6 

FT domain 448.. 485 

FT /label- EGF7 

FT domain 4 86,. 523 

FT /label- EGF8 

FT ■ domain 524.. 534 

FT /label- EGF9 

FT domain 555., 579 

FT /label- TM 

FT /note- "transmembrane domain" 

PN HO9701571-A1. 

PD 16-JAN-1997. 

PF 28-JON-1996; 011178. 

PR 28-JON-1995; OS-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (OYYA ) UNIV YALE. 

pi Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58897. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 2; 135pp; English, 

CC C-delta-1 polypeptide (W11719) is the chick homologue of Drosophila 

CC Delta, a protein that binds to Notch protein. Expression of 
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CC C-Delta-l correlates with onset of neurogenesis. The C-delta-1 

CC amino acid sequence was deduced from a cDNA clone (T58897) obtd. 

CC from chick stage 4-6 embryos. An alternatively spliced variant 

CC (W00876) was also isolated, and mouse (W11720) and human (W11721- 

CC 38) Delta-1 polypeptides have been identified, Delta-1 proteins 

CC can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, and nervous system disorders or to 

CC promote tissue regeneration and repair. 

SQ Sequence 727 AA; 

Query Match 5.9*; Score 666; DB 21; Length 727; 

Best Local Similarity 39.5%; Pred. No. 5.53e-36; 



I hi I: I h: h I :||||:||:| 



: II I III:: II I I : I I : Ml |;| || |:|:|: 



:H : : h ::: hi :|: I III : III I lh I hll hi II 



11:1 1111:1 h: hill I I I :hl II I :|: h 



RESULT 8 

ID WQ0876 standard; Protein; 740 AA. 

AC. W00876; 

DT 28-APR-1997 (first entry) 

DE C-Delta-l polypeptide (alternatively spliced variant). 

KW C-Delta-l; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy, 

OS Gallus sp. 



Matches 


Db 


303 


Qy 


921 


Db 


360 


1 


981 


Db 


419 


Qy 


1040 


Db 


478 


Qy 


1100 



FH 


Key 


Location/Qualifiers 


FT 


domain 


184,. 228 


FT 




/label- DSL 


FT 


domain 


229,. 261 


FT 




/label- EGF1 


FT 


domain 


262,. 292 






/label- EGF2 




domain 


293.. 332 


1 




/label- EGF3 




domain 


333. .370 


FT 




/label- EGF4 


FT 


domain 


371. .409 


FT 




/label- EGF5 


FT 


domain 


410. .447 


FT 




/label- EGF6 


FT 


domain 


448. .485 


FT 




/label- EGF7 


FT 


domain 


486. .523 


FT 




/label- EGF8 


FT 


domain 


524.. 534 


FT 




/label- EGF9 


FT 


domain 


555, .579 


FT 




/label- TM 


FT 




/note- "transmembrane domain 


PN 


WO9701571-A1. 




PD 


16-JAN-1997. 




PF 


28-JUN-1996; 011178. 


PR 


28-JUN-1995; DS-000589, 


PA 


(IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 


PA 


(UYYA ) UNIV YALE. 


PI 


Artavanis-Tsakonas S, Gray GE, Henrique D, 



PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58898. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

pt regeneration 

PS Disclosure; Fig 2; 135pp; English. 

CC C-delta-1 polypeptide (W00876) is the chick homologue of Drosophila 

CC Delta, a protein that binds to Notch protein. Expression of 

CC C-Delta-l correlates with onset of neurogenesis. The C-delta-1 

CC amino acid sequence was deduced from a cDNA clone (T58898) obtd. 

CC from chick stage 4-6 embryos. A shorter version (W58877) of 

CC C-Delta-l, lacking the 12 C-terminal amino acids of the longer 

CC version, was also isolated, and mouse (W11720) and human (W11721- 

CC 38) Delta-1 polypeptides have been identified, Delta-1 proteins 

CC can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, and nervous system disorders or to 

CC promote tissue regeneration and repair , 

SQ Sequence 740 AA; 

Query Match 5.94; Score 666; DB 21; Length 740; 

Best Local Similarity 39,54; Pred. No. 5.53e-36; 



I hi h I h: h I :lllhlhl 



h II I llh: II Ihll : hill hi II hhh 



Matches 


Db 


303 


Qy 


921 


Db 


360 


Qy 


981 


Db 


419 


Qy 


1040 


Db 


478 


Qy 


1100 



:|| : : h 



hi :h I III : III I lh I hll hi II 



I: III: h: hll I I I :|:|. II I :|: h 



RESULT 9 

ID W18351 standard; protein; 1036 AA. 

AC W18351; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression . 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 5; Page 66-71; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g, after immunosuppression. 

SQ Sequence 1036 AA; 

Query Match 5.8%; Score 650; DB 25; Length 1036; 

Best Local Similarity 36.7%; Pred. No. 7.61e-35; 
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Matches 88; Conservative 53; Mismatches 85; Indels 14; Gaps 7; 

Db 307 haclsdpchnrgscketslgf-ececspgwtgptcstniddcspnncshggtcq--d-lv 362 

::MI:i: I |:|: :: I I |: I I I I I :| I |||||: : 
Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 363 ngfkcvcppqwtgktcqldaneceakpcvnakscknliasyycdclpgwmgqncdinind 422 

:|| hi: I I::: ::|| : I I :| : I :| I I I |: |: ::: 
Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 423 c---lgqcqndascrdlvngyrcicppgyagdhcerdidecasnpclngghcqneinrfq 479 

I I 11:1: I :!::! I III hl|: |:|:| I I Ihll : :| : 
Qy 1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 

Db 480 clcptgfsgnlcqld---i---dy-cepnpcqngaqcynrasdyfckcpedyegkncshl 532 
hll hi! :h: : I: IIMIII I :| I |:| :| I 
1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKL 1155 



TffiS 



'ESOLT 10 

ID W18352 standard; protein; 1187 AA. 

AC W18352; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta- 1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens, 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996;. J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-KOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta- 1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 6; Page 71-76; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 
blood formation, e.g. after immunosuppression, 

■ Sequence 1187 AA; 

Query Match 5.8%; Score 650; DB 25; Length 1187; 

Best Local Similarity 36.7*; Pred. No. 7.61e-35; 

Matches 88; Conservative 53; Mismatches 85; Indels 14; Gaps 7; 

Db 307 haclsdpchnrgscketslgf-ececspgwtgptcstniddcspnncshggtcq--d-lv 362 

::IM:|| I :: I I |: II I I I :| I Mill: : 

Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 363 ngfkcvcppqwtgktcqldaneceakpcvnakscknliasyycdclpgwmgqncdinind 422 

:H 1:1: I I::: ::|| : I I :| : I :| I I I |: |: ::: 
Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 423 c---lgqcqndascrdlvngyrcicppgyagdhcerdidecasnpclngghcqneinrfq 479 

I I Ihh I :|::l I III hlh hhl I I Ihll : :| : 
Qy 1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 

Db 480 clcptgfsgnlcqld — i — dy-cepnpcqngaqcynrasdyfckcpedyegkncshl 532 

1:11 1:11 :l" : h IMIMI I :: :| I hi :| I 
Qy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKL 1155 



RESULT 11 

ID W18354 standard; protein; 1218 AA. 
AC W18354; 

DT ll-FEB-1998 (first entry) 



DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression, 

OS Homo sapiens, • 

FH Key Location/Qualifiers 

FT Peptide 1..31 
FT /label- Signal 

FT Protein 32.. 1218 

FT /label- Differentiation_suppression_protein 

PN W09719172-A1, 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27, 

DR N-PSDB; T70175. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 15; Page 83-91; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells, The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression, 

SQ Sequence 1218 AA; 

Query Match 5,84; Score 655; DB 25; Length 1218; 

Best Local Similarity 29.8%; Pred. No. 3.35e-35; 

Matches 164; Conservative 111; Mismatches 226; Indels 50; Gaps 40; 

Db 305 pclnggtcsntgpdkyqcscpegysgpnceiaehaclsdpchnrgsc--ketsl-gfece 361 

II I III:: I 1 : 1 : 1 1 h I :h:: llhhll : hi II II I 
Qy 922 PCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWCI 981 

Db 362 cspgwtgptcstniddcspnncshggtcqdlvngfkcvcppqwtgktcqldanqc-ea-k 419 

hill hill hi : :ll I :| : hllh III: : I : : 
Qy 982 CADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQDLN 1041 

Db 420 pcvnakscknlia-syycdclpgwmgqncdinindclgq-cqndascrdlvngyrcicpp 477 

II : II::: III II :|::|lh::|l I II I I Mil Mil 
Qy 1042 PCQHDSKCI-LTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICPE 1100 

Db 478 gyagdhce--rdidecasnpclngghcqneinrfqclcptgfsgnlcqldidycepnpcq 535 

I: II : ::|| : III lh :: :|| III 
Qy 1101 GYSGLFCEFSPPMVLPRTSPC-DNFDCQNG-A--QCIVR--INEPICQ CLPG-YQ 1148 

Db 536 ngaqcynrasdyfckcpedyegkncshlkdhcrtt-pcevidsctvamasndtpe-gvry 593 

I I : I I : I I : :: : .| : : : | :| 
Qy 1149 -GEKCEKLVSVNFIN-KESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVEL 1206 

Db 594 issnvcgphgkcksqsggkftcdc-nkg-ftgtychenindsesnpcrnggt-cidgvns 650 

: I :: :::::: | | | :::| | : | | :: 

Qy 1207 YRGRVRASYDTGSHPASAIYSVETINDGNFHIVE-LLALDQSLSLSVDGGNPKIITNLSK 1265 

Db 651 ykcicsdg--wegayce-tnindcsqnpchnggtcrdlvndfycdck-ngw-k-gk-tch 703 

: I: h :|: I I :|| : : ; ::| : | : | 
Qy 1266 QSTLNFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGI 1325 

Db 704 srdsq-cdeatcnnggtcydegda-fkcmcpggwegttcniarnssclpnpchnggtcv- 760 

: I l:IN ::l I I I II I h I :|| I I :| lh 
Qy 1326 LPGCEPCHKKVCAHG-TCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHG-TCLP 1383 

Db 761 vngesftcvckegwegpicaqntn--d-csphpcynsgtc-vdg-dnwyrcecapgfagp 815 

:|: |::| I 1 1 I : I : : : : I : I : I I : I I |||::|::| 
Qy 1384 INAFSYSCKCLEGHGGVLCDEEEDLFNPCQAIKC-KHGKCRLSGLGQPY-CECSSGYTGD 1441 

Db 816 dcrininecqs 826 

I :h h: 
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Oy 1442 SCDREIS-CRG 1451 



ID W05833 standard; Protein; 1218 AA. 

AC W05833; 

DT 28-JAN-1997 (first entry) 

DE Human Serrate -1 (HJ1), 

KW Serrate-1; human Jagged-1; HJ1; Notch; cell differentiation; 

KW cell fate; central nervous system; cancer; tissue repair; therapy; 

KW diagnosis; antibody. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT domain 1. .1067 

FT /label- Extracellularjomain 

FT peptide 14.. 29 

FT /label- Sig_peptide 

FT domain 185.. 229 

FT /label- DSL 

FT /note- "region of homology with Drosophila Delta 

•and Serrate, predicted to mediate binding 
with Notch" 
domain 234.. 896 

FT /label- ELR 

FT /note- "epidermal growth factor-like repeat domain" 

FT region 234.. 264 

FT /label- BLRl 

FT region 265.. 299 

FT /label- ELR2 

FT region 300.. 339 

FT /label- ELR3 

FT region 340.. 377 

FT /label- ELR4 

FT region 378.. 415 

FT /label- ELR5 

FT region 416.. 453 

FT /label- ELR6 

FT region . 454.. 490 

FT /label- ELR7 

FT region 491.. 528 

FT /label- ELR8 

FT region 529.. 566 

FT /label- ELR9 

FT region 567.. 598 

FT /label- PartialjLR 

FT region 599,. 632 

FT /label- Partial ELR 

FT region 633.. 670 

H /label- ELR10 

A region 671.. 708 

V /label- ELR11 

FT region 709,. 747 

FT /label- ELR12 

FT region 748.. 785 

FT /label- ELR13 

FT region 786,. 823 

FT /label- ELR14 

FT region 824.. 862 

FT /label- ELR15 

FT region 863.. 879 

FT /label- PartialjLR 

FT region 880.. 8 96 

FT /label- Partial ELR 

FT domain 1068.. 1089 

FT /label- Transmembrane domain 

FT domain 1090., 1218 

FT /label- Intracellular domain 

PN WO9627610-A1. 

PD 12-SEP-1996. 

PF 07-MAR-1996; U03172 . 

PR 07-MAR-1995; DS-400159. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (OYYA ) ONIV YALE. 



PI Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 

PI Lewis JH, Mann RS, Myat AM; 

DR WPI; 96-425379/42, 

DR N-PSDB; T40090, 

PT Vertebrate Serrate protein and related DNA - used to treat or 

PT prevent malignancies characterised by increased Notch activity. 

PS Claim 4; Page 95-98; 161pp; English, 

CC Human Serrate-1 (W05833) and human Serrate-2 (W05833) are ligands 

CC for the zygotic neurogenic locus Notch, and are believed to play a 

CC major role in determining cell fates (differentiation) in the 

CC central nervous system. Their amino acid sequences were deduced 

CC from cDNA clones (see also T40090-91) isolated from human foetal 

CC brain cDNA libraries. The proteins, antibodies raised to them, 

CC and encoding nucleic acids can be used in the detection of 

CC Serrate sequences and in the treatment of disorders of cell fate 

CC or differentiation, partic. cancer, nervous system disorders 

CC and in tissue repair or regeneration, 

SQ Sequence 1218 AA; 

Query Match 5.8*; Score 650; DB 19; Length 1218; 

Best Local Similarity 36.7%; Pred. No. 7.61e-35; 

Matches 88; Conservative 53; Mismatches 85; Indels 14; Gaps 7; 

Db 338 haclsdpchnrgscketslgf-ececspgwtgptcstniddcspnncshggtcq--d-lv 393 

-111:11 I hi: :; I I |: | | | | :| | ||||| ; : 
Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 394 ngfkcvcppqwtgktcqldaneceakpcvnakscknliasyycdclpgwmgqncdinind 453 

:M hh I !::: ::|| : I I :| : | :| | | | |: |: ::: 
Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 454 c---lgqcqndascrdlvngyrcicppgyagdhcerdidecasnpclngghcqneinrfq 510 

I I N:|: I :|::| | III |:||: |:|:| I | Id : :| : 
Qy 1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 

Db 511 clcptgf sgnlcqld- - -i- - -dy-cepnpcqngaqcynrasdyf ckcpedyegkncshl 563 

Ml 1:11 :h: : |: HUM I :: :| I |:| :| | 
Qy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKL 1155 



RESULT 13 

ID W44301 standard; Protein; 1218 AA. 

AC W44301; 

DT 19-JUN-1998 (first entry) 

DE Human serrate 1. 

KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1. .31 

FT /label- Signal 

FT Protein 32.. 1218 

FT /label- Serrate-1 

PN WO9802458-A1. 

PD 22-JAN-1998. 

PF ll-JOL-1997; J02414. 

PR 14-MAY-1997; JP-124063, 

PR 16-JCL-1996; JP-186220, 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15201. 

PT Human serrate-2 gene expression products - used to regulate stem 

PT cell differentiation, useful in treating neoplasms, e.g. leukaemia 

PS Disclosure; Page 77-86; 103pp; Japanese. 

CC The present sequence represents human serrate 1, from the present 

CC invention which describes human serrate 2, The present invention also 

CC describes a method for the preparation of the polypeptides, and 

CC antibodies binding to the polypeptide and its fragments. The polypeptide 

CC and its fragments expressed by the serrate-2-gene can be used to inhibit 

CC stem (especially blood stem) cell differentiation and to inhibit 

CC endothelial cell growth. They may be incorporated in a cell culture 

CC media for culturing undifferentiated stem cells. They can also be used 
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CC for treatment of neoplasms such as leukaemia. The antibodies can be used 
CC for the diagnosis of malignant tumours. 
SQ Sequence 1218 AA,- 

Query Match 5.8%; Score 650; DB 29; Length 1218; 

Best Local Similarity 36.7%; Pred. No. 7.61e-35; 

Matches 88; Conservative 53; Mismatches 85; Indels 14; Gaps 7; 

Db 338 haclsdpchnrgscketslgf-ececspgwtgptcstniddcspnncshggtcq--d-lv 393 

::Hh!l I hi: :: I I |: I I I I I :| I Mill: : 
Oy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 394 ngfkcvcppqwtgktcqldaneceakpcvnakscknliasyycdclpgwmgqncdinind 453 

:M 1:1: I |::: ::|| : I I :| : I :| I I I |: |: ::: 
Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 



# 



454 c---lgqcqndascrdlvngyrcicppgyagdhcerdidecasnpclngghcqneinrfq 510 

I I ll:h I :|::| I III 1 : 1 1 : l:|:| I I l|:|| : :| : 
1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 



Db 511 clcptgfsgnlcqld---i"-dy-cepnpcqngaqcynrasdyfckcpedyegkncshl 563 

Ml hll :|:: : |: lllllll I :: :| I |:| ;| I 
Qy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAOCIVRINEPICQCLPGYQGEKCEKL 1155 



RESULT 14 

ID W18348 standard; protein; 520 AA. 

AC W18348; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression . 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR DPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 • 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

« Claim 3; Page 59-61; ll4pp; Japanese. 
The present sequence represents a polypeptide which suppresses 
proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression, 

SQ Sequence 520 AA; 

Query Match 5.7%; Score 644; DB 25; Length 520; 

Best Local Similarity 38.2%; Pred, No. 2.03e-34; 

Matches 91; Conservative 53; Mismatches 81; Indels 13; Gaps 9; 

Db 275 kpckngatctn tgqgsytcscrpgy tgatcelg idecdpspcknggsc - - td - lensysc 331 

:HII :l : I |:| |: I |;: ] | ;:|||:||:| : |::; I 
Qy 921 NPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWC 980 

Db 332 tcppgfygkicelsamtcadgpcfnggrcsdspdggyscrcpvgysgfncekkidycs-s 390 

: II I II:: II I I : I |: : Ml M I Ml: 
Qy 981 ICADGFEGENCEVNVDDCEDNDCENNSTCVDGINN-YTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 391 -spcsngakcvdlgdaylcrcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcp 449 

:|| : :lh :: I I :|: I III : III : I lh I hll ::| II 
Qy 1040 LNPCQHDS KC I LTPKGFKCDCT PG Y VGEHC D IDFDDCQDNKCKNGAHCTDAVNG YTCIC P 1099 

Db 450 pgytgrnc--sap-v-sr---cehapchngatcherghryvcecargyggpncqfllp 500 

11:1 I hi I M h: hill I I = Ml II I M h: 
Qy ' 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKLVS 1157 



RESULT 15 

ID W18349 standard; protein; 702 AA. 
AC W18349; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW .Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1, 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 
PT proliferation and differentiation of undifferentiated human blood 
pt cells 

Claim 4; Page 61-64; 114pp; Japanese. 
The present sequence represents a polypeptide which suppresses 
proliferation and differentiation of undifferentiated cells such 
as neurons and blood cells. The polypeptide may be used for the 
prevention and control of disorders involving undifferentiated 
cells, such as leukaemia and malignant tumours, and improvement of 
blood formation, e.g. after immunosuppression. 
Sequence 702 AA; 

Query Match 5.7%; Score 644; DB 25; Length 702; 

Best Local Similarity 38,2%; Pred. No. 2.03e-34; 

91; Conservative 53; Mismatches 81; Indels 13; Gaps 



275 kpckngatctntgqgsytcscrpgytgatcelgidecdpspcknggsc--td-lensysc 331 

Ml M : | |:l h Ml MIMM : I::: I 
921 NPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWC 980 



Db 332 tcppgfygkicelsamtcadgpcfnggrcsdspdggyscrcpvgysgfncekkidycs-s 390 

h II I M || | | : | |: : |:| !| :| || :|;|; 
Qy 981 ICADGFEGENCEVNVDDCEDNDCENNSTCVDGINN-YTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 391 -spcsngakcvdlgdaylcrcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcp 449 

M : :||: :: I MIM III Mil M M I hll ::l II 
Qy 1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

Db 450 pgytgrnc - -sap-v-sr- —cehapchngatcherghryvcecargyggpncqf lip 500 

1:1 I hi I :l h: hill I I : Ml I I M h: 
Qy 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKLVS 1157 



Search completed: Fri May 28 ( 
Job time : 167 sees. 
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24 


558 4 


.9 1429 


S06434 


homeotic protein lin- 


5. 04e-73 


**************************************************************************** 


25 


522 4 


.6 293 


B26637 


neurogenic repetitive 


1.68e-66 




26 


507 4 


.5 473 


A56175 


adhesive plaque prote 


8.47e-64 


i~T\ FT\ i i i i i _ i i i fi fi 


27 


494 < 


.4 560 


A60164 


platelet membrane gly 


1.83e*61 


1 \ V /I 1 1 III Mill 1 


28 


486 < 


.3 1091 


A58532 


glial cell membrane g 


4.98e-60 


M W II 1 1 1 1 1 l_ I 


29 


470 < 


.2 387 


B49175 


Motch A protein ■ mou 


3.59e-57 


ii mi i i ii _i 1 1 i i 


30 


478 t. 


,2 1404 


A36666 


serrate protein precu 


1.34e-58 


ii i i i i i i i i\ \ i i ii 


31 


478 t 


,2 1408 


S16148 


gene serrate protein 


1.34e-58 


ii ii i i _j i i i \ \ i i_ i i i i 


32 


464 4 


,1 200 


A26637 


neurogenic repetitive 


4.21e-56 


II 1 1 1 1 1 | | | \\ | 1 1 1 1 1 f TM) 


33 


452 4 


,0 1295 


A32901 


glpl protein precurso 


5.71e-54 




34 


428 ; 


.8 259 


S48713 


fetal antigen 1 ■ hum 


9.96e-50 




35 


419 3 


,7 260 


A44549 


fetal antigen 1 homeo 


3.81e-48 




36 


422 3 


.7 383 


B45484 


delta-like dlk homeot 


1.13e-48 


. Release 3.1A John F. Collins, Biocomputing Research Unit. 


37 


422 3 


.7 383 


S53716 


homeotic protein dlk 


1.13e-48 


Copyright (c) 1993-1998 University of Edinburgh, U.K. 


38 


422 3 


.7 1134 


A29944 


chaoptin precursor - 


1.13e-48 


Distribution rights by Oxford Molecular Ltd 


39 


417 3 


.7 1959 


AGRT 


agrin • rat 


8.54e-48 




40 


405 3 


,6 385 


S53718 


homeotic protein dlk 


1.08e-45 


^rch_pp protein - protein database search, using Smith-Waterman algorithm 


41 


404 3 


.6 4391 


A38096 


perlecan precursor • 


1.61e-45 




42 


391 3 


.5 360 


147020 


decorin - rabbit 


2.96e-43 


Run on: Fri May 28 08:25:45 1999; MasPar time 61.26 Seconds 


43 


395 3 


.5 385 


A54785 


preadipocyte factor 1 


5.97e-44 


997,388 Million cell updates/sec 


44 


395 3 


.5 536 


A34901 


lysine carboxypeptida 


5.97e-44 


Tabular output not generated. 


45 


393 3 


.5 907 


JE0176 


orphan G protein -coup 


1.33e-43 



Title: MJS-09-191-647-2 

Description: (1-1525) from US09191647. pep 

Perfect Score: 11299 

Sequence; 1 MRGVGWQMLSLSLGLVLAIL SSFVDEVEKWKCGCTRCVS 1525 

Scoring table: PAM 150 
Gap 11 

Searched: 122810 seqs, 40068593 residues 

Post -processing: Minimum Match 0* 

Listing first 45 summaries 

Database: pir60 

l:pirl 2:pir2 3:pir3 4:pir4 

Statistics: Mean 55.224; Variance 119.421; scale 0.462 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Result Query 

No. Score Match Length D 



Description 



Pred. 



1 


4216 


37.3 


1469 


B36665 


2 


4210 


37.3 


1480 


A36665 


3 


1470 


13.0 


530 


A31640 


4 


735 


6.5 


1064 


A40136 


5 


734 


6.5 


2321 


S78549 


6 


726 


6.4 


570 


A48836 


7 


719 


6.4 


2318 


S45306 


8 


714 


6.3 


2524 


A35844 


9 


708 


6.3 


2531 


S18188 


10 


708 


6.3 


2703 


A24420 


11 


705 


6.2 


2471 


A49128 


12 


704 


6.2 


2531 


A46019 


13 


666 


5.9 


728 


150719 


14 


667 


5.9 


2437 


S42612 


15 


667 


5.9 


2555 


A40043 


16 


650 


5.8 


1220 


A56136 


17 


647 


5.7 


1203 


A49175 


18 


627 


5.5 


722 


148324 


19 


609 


5.4 


2139 


A35672 


20 


554 


4.9 


832 


A31246 


21 


554 


4.9 


833 


S19087 


22 


552. 


4.9 


861 


A48825 


23 


554 


4.9 


880 


S00670 



slit protein 2 precur 
slit protein 1 precur 
epidermal growth fact 
fibropellin la - sea 
notch3 protein - huma 
fibropellin C precurs 
notch 3 protein - mou 
Xotch protein - Afric 
notch protein homolog 
notch protein - fruit 
cell-fate determining 
gene Notch- 1 protein 
C-Delta-1 • chicken 
transmembrane protein 
notch protein homolog 
jagged protein precur 
Motch B protein - mou 
DELTA-like 1 - mouse 
crumbs protein - frui 
neurogenic protein De 
gene Delta protein pr 
Notch homolog Motch p 
gene Delta protein pr 



0.00e+00 
0.00e+00 
6.88e-246 
1.20e-105 
l,85e-105 
5.71e-104 
1.15e-102 
9.77e-102 
1.27e-100 
1.27e-100 
4.60e-100 
7.05e-100 
7.77e-93 
5.08e-93 
5.08e-93 
7.01e-90 
2.51e-89 
1.21e-85 
2.44e-82 
2..69e-72 
2,69e-72 
6.20e-72 
2.69e-72 





ALIGNMENTS 


RESULT 1 




ENTRY 


B36665 ttype complete 


TITLE 


slit protein 2 precursor - fruit fly (Drosophila 




melanogaster) 


ORGANISM 


tformaljiame Drosophila melanogaster 


DATE 


30-Apr-1991 fsequence_revision 30-Apr-1991 ttext change 




16-Dec-1998 


ACCESSIONS 


B36665 


REFERENCE 


A36665 


tauthors 


Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 




Artavanis-Tsakonas, S. 


tjournal 


Genes Dev. (1990) 4:2169-2187 


ttitle 


slit: an extracellular protein necessary for development of 




midline glia and commissural axon pathways contains both 




EGF and LRR domains . 


f cross -references MOID: 91099665 


faccession 


B36665 


ttstatus 


preliminary 1 


ttmolecule.type mRNA 


It f residues 


1-1469 ttlabel ROT 


ttcross -references GB:X53959 


GENETICS 




tgene 


FlyBase:sli 


ttcross -re 


ferences FlyBase:FBgn0003425 


CLASSIFICATION 


tsuperfamily proteoglycan amino-terminal homology; EGF 




homology; leucine-rich alpha- 2 -glycoprotein repeat 




homology; proteoglycan carboxyl- terminal homology 


FEATURE 




66-91 


tdomain proteoglycan amino-terminal homology tlabel 




PAH1\ 


101-124 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology tlabel LRR1\ 


125-148 


tdomain leucine-rich alpha - 2 - glycoprote in repeat 




homology tlabel LRR2\ 


149-172 


tdomain leucine-rich alpha - 2 - glycoprote in repeat 




homology tlabel LRR3\ 


173-196 ■ 


tdomain leucine-rich alpha - 2 - glycoprote in repeat 




homology tlabel LRR4\ 


197-220 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


228-272 


tdomain proteoglycan carboxyl -terminal homology tlabel 




PCS1\ 


288-313 


tdomain proteoglycan amino-terminal homology tlabel 




PAH2\ 


323-346 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LRR6\ 


347-370 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 
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homology # label LRR7\ 
371*394 idomain leucine-rich alpha -2 -glycoprotein repeat 

homology ilabel LRR8\ 
395-418 idomain leucine-rich alpha - 2 -g lycopr otein repeat 

homology tlabel LRR9\ 
^19-442 idomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LR10\ 
450-494 idomain proteoglycan carboxyl-terminal homology flabel 

PCS2\ 

512-537 idomain proteoglycan amino-terminal homology tlabel 

PAH3\ 

547-571 idomain leucine-rich alpha - 2 - glycoprotein repeat 

homology tlabel LR11\ 
572-595 Idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LR12\ 
596-619 idomain leucine-rich alpha -2 -glycoprotein repeat 

homology ilabel LR13\ 
620-643 idomain leucine-rich alpha -2 -glycoprotein repeat 

homology ilabel LR14\ 
651-695 idomain proteoglycan carboxyl-terminal homology tlabel 

A PCS3\ 

•F08-733 idomain proteoglycan amino-terminal homology tlabel 

V PAH4\ 

743-766 idomain leucine-rich alpha - 2 -glycoprotein repeat 

homology ilabel LR15\ 
767-790 idomain leucine-rich alpha- 2 - glycoprotein repeat 

homology tlabel LR16\ 
846-890- idomain proteoglycan carboxyl-terminal homology tlabel 

PCS4\ 

1028-1061 tdomain EGF homology ilabel EGF 

SUMMARY tlength 1469 tmolecular -weight 164695 tchecksum 8361 

Query Match 37 .3%; Score 4216; DB 2; Length 1469; 

Best Local Similarity 43.9*; Pred. No. 0.00e+00; 

Matches 607; Conservative 328; Mismatches 394; Indels 53; Gaps 34; 

Db 105 LEUJGNNLTVIYETDFQRLTKLRMLQLTDNQIHTIERNSFQDLVSLERLDISNNVITTVG 164 

1:1 I- I II I I hi I": : I I |||:|:| I :: 
Oy 84 LQLMENKISTIERGAFQDLKELERLRLNRNHLQLFPELLFLGTAKLYRLDLSENQIQAIP 143 

Db 165 RRVFKGAQSLRSLQLDNNQITCLDEHAFKGLVELEILTLNNNNLTSLPHNIFGGLGRLRA 224 

I: 1 = 11 :|::: ll::| Mh |:| |: I : :||: 

Qy 144 RKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKLRT 203 

Db 225 LRLSDNPFACDCHLSWLSRFLRSATRLAPYTRCQSPSQLKGQNVADLHDQEFKCSGLTE- 283 

Ml I : Hllhlll II |:: ll:| : 1 1 : 1 : 1 : 1 1 1 : : : :|| || | 
Qy 204 FRLHSNNLYCDCHLAWLSDWLRKRPRVGLYTQCMGPSHLRGHNVAEVQKREFVCSDEEEG 263 



284 HAP-MECG-AENSCPHPCRCADGIVDCREKSLISVPVTLPDDTTDVRLEQNFITELPPKS 341 

I : I : : II :l I" Mill hll :l II: |::||||| I Ml : 
264 HQSFMAPSCSVLHCPAACTCSNNIVDCRGKGLTEIPTNLPETITEIRLEQNTIKVIPPGA 323 



ft 



Db 



342 FSSFRRLRRIDLSNNNISRIAHDALSGLKQLTTLVLYGNKIKDLPSGVFKGLGSLRLLLL 401 
ll::::IIIIIIIH II :l I'M II: I MIIIIMI :|| ::| II Ihllll 
Oy 324 FSPYKKLRRIDLSNNQISEIAPDAFQGLRSLNSLVLYGNKITELPKSLFEGLFSLQLLLL 383 

Db 402 NANEISCIRKDAFRDLHSLSLLSLYDNNIQSLANGTFDAMRSMKTVHLAKNPFICDCNLR 461 

III hl:l lll:IM:|:|||||||::|::|:||| ::::: |:||| |||||||:|: 
Qy 384 NANKINCLRVDAFQDLHNLNLLSLYDNKLQTIAKGIFSPLRAIQTMHLAQNPFICDCHLK 443 



462 WLADYLHKNPIETSGARCESPKRMHRRRIESLREEKFKCS-WGELRMKLSGECRMDSDCP 520 

IMIIII 1 1 1 1 1 ! I M I ||;|: :| :: ||:|| : | ||||:| | || 
444 WLADYLHTNPIETSGARCTSPRRLANKRIGQIKSKKFRCSGTEDYRSKLSGDCFADLACP 503 



Db 521 AMCHCEGTTVDCTGRRLKEIPRDIPLHTTELLLNDNELGRISSDGLFGRLPHLVKLELKR 580 

:|:|IMIIII: ::h II II hll Ihlh : : hi :||:| |::: 
Qy 504 EKCRCEGTTVDCSNQKLNKIPEHIPQYTAELRLNNNEFTVLEATGIFKKLPQLRKINFSN 563 

Db 581 NQLTGIEPNAFEGASHIQE--L--QL-G-ENKI-K-E I-SNKM FL 616 

I :l II MUM : I I :| ::|: I I : I:: : 
Qy 564 NKITDIEEGAFEGASGVNEILLTSNRLENVQHKMFKGLESLKTLMLRSNRITCVGNDSFI 623 

Db 617 GLHQLKTLNLYDNQISCVMPGSFEHLNSLTSLNLASNPFNCNCHLAWFAECVRKKSLNGG 676 



II :: hlllllh I Ihh 1 : 1 1 : : ! 1 1 : M 1 1 1 1 1 ll|::| Mil : I 
Qy 624 GLSSVRLLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLAWLGEWLRKKRIVTG 683 

Db 677 AARCGAPSKVRDVQIKDLPHSEFKCSSENSE-GCLGDGYCPPSCTCTGTWACSRNQLKE 735 

Ml I ■■■■■■ I h: M I I i M : II Ml III II : II 
Qy 684 NPRCQKPYFLKEIPIQDVAIQDFTCDDGNDDNSCSPLSRCPTECTCLDTWRCSNKGLKV 743 

Db 736 IPRGIPAETSELYLESKEIEQIHYERIRHLRSLTRLDLSNNQITILSNYTFANLTKLSTL 795 

MMM : = 1 1 1 1 : : I : : : I : : : || : : I : Ml MMM I II 

Qy 744 LPKGIPRDVTELYLDGNQFTLVPKE-LSNYKHLTLIDLSNNRISTLSNQSFSNMTQLLTL 802 

Db 796 IISYNKLQCLQRHALSGLNNLRVVSLHGNRISMLPEGSFEDLKSLTHIALGSNPLYCDCG 855 

IMIhhh ::: f I : : 1 1 : : 1 1 1 1 1 1 1 : : 1 1 1 : t : ! I : I : I : I : I : I H 1 1 1 1 
Qy 803 ILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISWPEGAFNDLSALSHLAIGANPLYCDCN 862 

Db 856 LKWFSDWIKLDYVEPGIARCAEPEQMKDKLILSTPSSSFVCRGRVRNDILAKCNACFEQP 915 

: 1 : 1 1 1 : t M MMM I :| j 1 1 : i : 1 1 1 I hi I : 1 1 1 1 1 1 : 1 : I 
Qy 863 MQWLSDWVKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDVNILARCNPCLSNP 922 

Db 916 CQNQAQCVALPQREYQCLCQPGYHGKHCEFMIDACYGNPCRBNATCTVLE--EGRFSCQC 973 

I h: I : I hi I I: I h I II Mlh: Ml : I I III 
Qy 923 CKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWCIC 982 

Db 974 APGYTGARCETNIDDCLGEIKCQNNATCIDGVESYKCECQPGFSGEFCDTKIQFCSPEFN 1033 

I h I II hill : I : i 1 : 1 1 : 1 1 : : : I I I I :Mhh hMh ::| 

Qy 983 ADGFEGENCEVNVDDC - EDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQDLN 1041 

Db 1034 PCANGAKCMDHFTHYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDD 1093 

II : MM : III :|: I M : r 1 1 1 1 : : : J ||: | |::| | | ||; 

Qy 1042 PCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICPEG 1101 

Db 1094 YTGKYCEGHNMISMMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTS 1153 

IM Ml : M: I ' M II : I M M I : :: MM III Mill 
Qy 1102 YSGLFCE-FSP-PMVLPRTSPCDNFDCQNGA--QCIVRINEPICQCLPGYQGEKCEKLVS 1157 

Db 1154 ISFVHNNSFVELEPLRTRPEANVTIVFSSAEQNGILMYDGQDAHLAVELFNGRIRVSYDV 1213 

::!::::!:::: : : ! I : : I : I : ::: I : : 1 1 1 : 1 h 1 : 1 1 1 1 : MM III 
Qy 1158 VNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRVRASYDI 1217 

Db' 1214 GNHPVSTMYSFEMVADGKYHAVELLAIKKNFTLRVDRGLARSIINEGSNDYLKLTTPMFL 1273 

IMI hMI I : IhM llllh ::M II I :: I I : |:: :|::: 
Qy 1218 GSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSTLNFDSPLYV 1277 

Db 1274 GGLPVDPAQQAYKNWQIRNLTSFKGCMKEVWINHKLVDFGNAQRQQKITPGCAL-LEGE- 1331 

MM : : : M III lh::: II I II : I | Ml 
Qy 1278 GGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPGCEPCHKKVC 1337 

Db 1332 QQEE-EDDEQD-FM-D--ET-PHIKEEPV-DPCLENKCRRGSRCVPNSNARDGYQCKCK 1383 
: : I t : I : :: |||| III :|: hi II M III ■ 
Qy 1338 AHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGT-CLPI-NAF-SYSCKCL 1394 

Db 1384 HGQRGRYCDQAASTCRKEQVREYYTENDCR-SRQPLKYAKCVGGCGNQCCAAKIVRR-RK 1441 

h I lh I III I I M : I I I : 

Qy 1395 EGHGGVLCDEEEDLFNPCQAIKC-KHGKCRLSGLGQPYCECSSGYTGDSCDREISCRGER 1453 

Db 1442 VR 1443 

M 

Qy 1454 JR 1455 



RESULT 2 

ENTRY 

TITLE 

ORGANISM 
DATE 



t journal 
ttitle 



A36665 ttype complete 

slit protein 1 precursor • fruit fly (Drosophila 

melanogaster) 
tformal_name Drosophila melanogaster 
30-Apr-1991 tsequence_revision 30-Apr-1991 ttext change 

24-Sep-1998 
A36665; S13523 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S,; 

Artavanis -Tsakonas , S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of ■ 
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midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
f cross -references MUID: 91099665 
taccession A36665 

ftstatus preliminary 
ttmolecule_type mRNA 
iiresidues 1-1480 Mlabel ROT 
ticross-references GB:X53959; NID:g8614; PID;g8615 
GENETICS 

igene FlyBaseisli 

iicross -references FlyBase:FBgn0003425 
CLASSIFICATION fsuperfamily proteoglycan amino-terminal homology? EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl -terminal homology 
KEYWORDS alternative splicing 

FEATURE 

•66-91 fdomain proteoglycan amino-terminal homology flabel 

PAH1\ 
101-124 fdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR1\ 
125-148 fdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR2\ 
1*9-172 fdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
173-196 fdomain leucine-rich alpha-2 -glycoprotein repeat 

homology flabel LRR4\ 
197-220 fdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR5\ 
228-272 fdomain proteoglycan carboxyl -terminal homology flabel 

PCS1\ 

288-313 fdomain proteoglycan amino-terminal homology tlabel 

PAH2\ 

323-346 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
347-370 fdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR7\ 
371-394 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR8\ 
395-418 fdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR9\ 
419-442 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LR10\ 
450-494 fdomain proteoglycan carboxyl-terminal homology tlabel 

PCS2\ 

•512-537 tdomain proteoglycan amino-terminal homology flabel 

PAH3\ 
547-571 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRU\ 
572-595 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology flabel LR12\ 
596-619 fdomain leucine-rich alpha-2 -glycoprotein repeat 

homology flabel LR13\ 
620-643 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR14\ 
651-695 fdomain proteoglycan carboxyl-terminal homology flabel 

PCS3\ 

708-733 fdomain proteoglycan amino-terminal -homology flabel 

PAH4\ 

743-766 fdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR15\ 
767-790 fdomain leucine-rich alpha-2 -glycoprotein repeat 

homology flabel LR16\ 
791-814 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology flabel LR17\ 
815-838 fdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR18\ 
846-890 fdomain proteoglycan carboxyl-terminal homology flabel 

PCS4\ 

1028-1061 fdomain EGF homology tlabel EGF 

SUMMARY tlength 1480 tmolecular -weight 165751 tchecksum 900 

Query Match 37.3%; Score 4210; DB 2; Length 1480; 

Best Local Similarity 44,51; Pred. No. 0.00e+00; 



Matches 600; Conservative 328; Mismatches 369; Indels 52; Gaps 33; 

Db 105 LELQGNNLTVIYETDFQRLT KLRMLQLTDNQIHT I ERNSFQDLVSLERLD I SNNVITTVG 164 

hi I": I II I I |:| I::: : I I |||:|:| I :: 
Qy 84 LQLMENKISTIERGAFQDLKELERLRLNRNHLQLFPELLFLGTAKLYRLDLSENQIQAIP 143 

Db 165 RRVFKGAQSLRSLQLDNNQITCLDEHAFKGLVELEILTLNNNNLTSLPHNIFGGLGRLRA 224 

I: 1:11 ":IIM MM :| : 1 1 : 1 1 1 1 1 1 1 : 1 h I : :||: 

Qy 144 RKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKLRT 203 

Db 225 LRLSDNPFACDCHLSWLSRFLRSATRLAPYTRCQSPSQLKGQNVADLHDQEFKCSGLTE- 283 

:M I : llllhlll II I:: hi : 1 1 : 1 : 1 : 1 1 1 : ; ; :|| I I 
Qy 204 FRLHSNNLYCDCHLAWLSDWLRKRPRVGLYTQCMGPSHLRGHNVAEVQKREFVCSDEEEG 263 

Db 284 HAP-MECG-AENSCPHPCRCADGIVDCREKSLTSVPVTLPDDTTDVRLEQNFITELPPKS 341 

1:1:: II :l h: Mill hll :| II: l::||||| I :|| ; 
Qy 264 HQSFMAPSCSVLHCPAACTCSNNIVDCRGKGLTEIPTNLPETITEIRLEQNTIKVIPPGA 323 

Db 342 FSSFRRLRRIDLSNNNISRIAHDALSGLKQLTTLVLYGNKIKDLPSGVFKGLGSLRLLLL 401 

ll::::HIIIIIII II :| II: lh I Mlllllll :|| ::| II Ihllll 
Qy 324 FSPYKKLRRIDLSNNQISELAPDAFQGLRSLNSLVLYGNKITELPKSLFEGLFSLQLLLL 383 

Db 402 NANEISCIRKDAFRDLHSLSLLSLYDNNIQSLANGTFDAMKSMKTVHLAKNPFICDCNLR 461 

III l:hl lll:|ll:|:IHIIIh:|::|:||| |:||| llllllhh 
Qy 384 NANKINCLRVDAFQDLHNLNLLSLYDNKLQTIAKGTFSPLRAIQTMHLAQNPFICDCHLK 443 

Db 462 WLADYLHRNPIETSGARCESPKRMHRRRIESLREEKFKCS-WGELRMKLSGECRMDSDCP 520 

lllllll Mllllllll Ihh :|| ;: ||:|| : I ||||:| | 
Qv 444 WLADYLHTNPIETSGARCTSPRRLANKRIGQIKSKKFRCSGTEDYRSKLSGDCFADLACP 503 

Db 521 AMCHCEGTTVDCTGRRLKEIPRDIPLHTTELLLNDNELGRISSDGLFGRLPHLVKLELKR 580 

:|:MIIIII|: ::h II II hll l|;||: : : |:| :||:| I:;: 
Qy 504 EKCRCEGTTVDCSNQKLNKIPEHIPQYTAELRLNNNEFTVLEATGIFKKLPQLRKINFSN 563 

Db 581 NQLTGIEPNAFEGASHIQE--L---QL-G-ENKI-K--E I-SNKM FL 616 

I :| II IIMII : II ;| :;|: I I : ||:: : 
Qy 564 NKITDIEEGAFEGASGVNEILLTSNRLENVQHKMFKGLESLKTLMLRSNRITCVGNDSFI 623 

Db 617 GLHQLKTLNLYDNQISCVMPGSFEHLNSLTSLNLASNPFNCNCHLAWFAECVRKKSLNGG 676 

II. :: hllllll: I Ihh l:ll::MI :lllllll I!:: :||| : | 
Qy 624 GLSSVRLLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLAWLGEWLRKKRIVTG 683 

Db 677 AARCGAPSKVRDVQIKDLPHSEFKCSSENSE-GCLGDGYCPPSCTCTGTWACSRNQLKE 735 

:M I =::: I I" =1 I I : :| : II III III II : II 
Qy 684 NPRCQKPYFLKEIPIQDVAIQDFTCDDGNDDNSCSPLSRCPTECTCLDTWRCSNKGLKV 743 

Db 736 IPRGIPAETSELYLESNEIEQIHYERIRHLRSLTRLDLSNNQITILSNYTFANLTKLSTL 795 

mill : mi|::|:: : I : : : II :|||||:|; III :|:|:| | || 
Qy 744 LPKGIPRDVTELYLDGNQFTLVPRE-LSNYKHLTLIDLSNNRISTLSNQSFSNMTQLLTL 802 

Db 796 IISYNKLQCLQRHALSGLNNLRWSLHGNRISMLPEGSFEDLKSLTHIALGSNPLYCDCG 855 

1:111:1:1: ::: ll::ll::lllll l|::lll:|:|| : 1 : 1 : 1 : 1 : 1 1 1 1 1 1 1 
Qy 803 ILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISWPEGAFNDLSALSHLAIGANPLYCDCN 862 

Db 856 LKWFSDWIKLDYVEPGIARCAEPEQMKDKLILSTPSSSFVCRGRVRNDILAKCNACFEQP 915 

: 1:111:1 :| Hllllll I :| IM:|:||| I hi I : 1 1 M M : I : I 
Qy 863 MQWLSDWVKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDVNILAKCNPCLSNP 922 

Db 916 CQNQAQCVALPQREYQCLCQPGYHGKHCEFMIDACYGNPCRNNATCTVLE--EGRFSCQC 973 

I I:: I : I hi I h I h Ml : I M : : :|| : I I III 
Qy 923 CKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWCIC 982 

Db 974 APGYTGARCETNIDDCLGEIKCQNNATCIDGVESYKCECQPGFSGEFCDTKIQFCSPEFN 1033 

I h I II hill : Ml:il:li:::| I | | ::||:|: ::||: ::!' 

Qy 983 ADGFEGENCEVNVDDC-EDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQDLN 1041 

Db 1034 PCANGAKCMDHFTHYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDD 1093 

II : :lh : III :|: I :| ::||||:::| lh I |::| I I lh 

Qy 1042 PCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICPEG 1101 

Db. 1094 YTGKYCEGHNMISMMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTS 1153 

1:1 :H t :h hlllhl :| :| I : :: :|:| III 

Qy' 1102 YSGLFCE-FSP-PMVLPRTSPCDNFDCQNGA--QCIVRINEPICQCLPGYQGEKCEKLVS 1157 
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Db 


1154 


Qy 


1158 


Db 


1214 


Qy 


1218 


Db 


1274 


Qy 


1278 


Db 


1332 


Qy 


1338 


Db 


1384 


Qy 


1395 




IlLT 




RY 


TITLE 



ll::|:|: 



I: Mill: Ihl Ml 



1 : 1 1 I : Mill: :::| II I :: I I 



: III II:::: II I II : I I III 



III! III :|: hi I! :| III 



I: I II: I :| 



A31640 ftype fragment 

epidermal growth factor-like protein slit ■ fruit fly 
(Drosophila melanogaster) (fragment) 
ORGANISM t formal jame Drosophila melanogaster 
DATE 28-Feb-1990 tsequence.revision 28-Feb-1990 ttext.change 

14-Aug-1998 
ACCESSIONS A31640 
REFERENCE A31640 

lauthors Rothberg, J.M.; Hartley, D.A.; walther, z.; 

Artavanis-Tsakonas, S, 
Ijournal Cell (1988} 55:1047-1059 

ititle slit: An EGF -homologous locus of D. melanogaster involved in 

the development of the embryonic central nervous system, 
fcross-references MUID;89077533 
faccession A31640 
ttmoleculejiype DNA 
iiresidues 1-530 Itlabel ROT 
ftcross-references GB:M23543; NID:g340939; PID:g514357 
GENETICS 

fgene FlyBaseisli 

itcross-references FlyBase : FBgn0003425 



fintrons 



470/3 



CLASSIFICATION tsuperfamily EGF homology 
KEYWORDS growth factor 



FEATURE 
148-181 



er) 



lery Match 



f domain EGF homology f label EGF 
tlength 530 tchecksum 6330 



13.0%; Score 1470; DB'2; Length 530; 



Best Local Similarity 39.1%; Pred. No. 6.88e-246; 



Matches 


206; Conservative 132; Mismatches 168; Indels 21; Gaf 


S 17 


Db 


1 


MKDKLILSTPSSSFVCRGRVRNDILAKCNACFEQPCQNQAQCVALPQREYQCLCQPGYHG 
1 111:1:111 1 hi 1 :MIIII:|: II |:: 1 : 1 |:| 1 |: 1 
MADKLLLTTPSKKFTCQGPVDVNILAKCNPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKG 


60 


Qy 


888 


947 


Db 


61 


KHCEFMIDACYGNPCRNNATCTVLE-EGRFSCQCAPGYTGARCETNIDDCLGEIKCQNN 
1: 1 M :IH:: :|l : 1 1 1 1 II |: 1 II l:lll : l:|l 
QDCDVPIHAC ISNPCKHGGTCHLKEGEEDGFWCICADGFEGENCEVNVDDC - EDNDCENN 


118 


Qy 


948 


1006 


Db 


119 


ATCIDGVESYKCECQPGFSGEFCDTKIQFCSPEFNPCANGAKCMDHFTHYSCDCQAGFHG 
:||:M:::| 1 1 1 ::||:|: |::||: ::||| : :||: : III :|: I 
STCVDGINNYTCLCPPEYTGELCEEKLDFCAQDLNPCQHDSKCILTPKGFKCDCTPGYVG 


178 


Qy 


1007 


1066 


Db 


179 


TNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMYPQTSPCQNH 

:| ::IUI:::| 11: 1 |::| 1 1 II: hi :|| : :|: |:||||:| 
EHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICPEGYSGLFCE-FSP-PMVLPRTSPCDNF 


238 


Qy 


1067 


1124 


Db' 


239 


ECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTRPEANVTI 

:| :| 1 : :: :|:| III 1 II 1 1 :: 1 :: ::| ::: : : : ||::|:|: 


298 



Qy 1125 DCQNGA--QCIVRINEPICOCLPGYQGEKCEKLVSVNFINKESYLQIPSAKVRPQTNITL 1182 



::: :: Mhl h 1 : 1 1 1 1 : Ihl III hll |::|| I : lh:| III! . 



Qy 


1125 


Db 


299 


Qy, 


1183 


Db 


358 


Qy 


1243 


Db 


418 


Qy 


1303 


Db 


469 


Qy 


1363 



h :::| II I :: I I : |:: :|:::||:| 



I III 



MM III :h hi II M III h I lh I 



RESULT 4 

ENTRY A40136 itype complete 

TITLE fibropellin la - sea urchin (Strongylocentrotus purpuratus) 

ALTERNATEJAMES epidermal growth factor homolog precursor 
CONTAINS alternatively spliced fibropellin lb (EGFI) 

ORGANISM tformaljante Strongylocentrotus purpuratus icommonjiame 
purple urchin 

DATE 13-May-1992 isequence revision 17-Sep-1997 itext change 

07-Aug-1998 

ACCESSIONS A40136; B40136; C40136; A29316; A43131 
REFERENCE A40136 

lauthors Delgadillo-Reynoso, M.G.; Rollo, D.R.; Hursh, D.A.; Raff, 
R.A. 

Ijournal J. Mol. Evol. (1989) 29:314-327 

Ititle Structural analysis of the uEGF gene in the sea urchin 

Strongylocentrotus purpuratus reveals more similarity to 
vertebrate than to invertebrate genes with EGF-like 
repeats , 

tcross -references MUID: 90112459 
♦accession A40136 

ttstatus preliminary 

tlmolecule.type mRNA 

tfresidues 1-114 Itlabel DEL 

Ifcross-references GB:X17530; NID:gl0225; PID:g667061 
laccession B40136 

Ifstatus preliminary; not compared with conceptual translation 

Itmolecule.type DNA 

Iiresidues 181-251,329-370, 'R' ,372-408, 'RA' ,411-441 ftlabel DE2 
laccession C40136 

tlstatus preliminary; not compared with conceptual translation 
Itmolecule.type DNA 

Iiresidues T, 747 -821, 898-978 ttlabel DE3 
REFERENCE A29316 

♦authors Hursh, D.A.; Andrews, M.E.; Raff, R.A, 

Ijournal Science (1987) 237:1487-1490 

Ititle A sea urchin gene encodes a polypeptide homologous to 

epidermal growth factor. 
Icross-references MUID: 87319677 
laccession A29316 

tlstatus preliminary 

ttmolecule_type mRNA 

Iiresidues 'S\ 280-481, 786-1064 Itlabel HUR 
llcross-references GB:M17421; NID:gl61474; PID:g552260 
A43131 

lauthors Hunt, L.T.; Barker, W,C. 
Ijournal FASEB J. (1989) 3:1760-1764 

Ititle Avidin-like domain in an epidermal growth factor homolog from 

a sea urchin. 
Icross-references MUID: 89196806 
Icontents annotation 
COMMENT EGF homology repeats 10-17 are spliced out in the short form 
(fibropellin lb). 

CLASSIFICATION tsuper family Clr/Cls repeat homology; EGF homology 
FEATURE 
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1-19 

20-1064 

23-54 

57-175 

180-211 

218-249 

256-287 

294-325 

332-363 

370-401 

408-439 

446-477 

484-515 

522-553 

560-591 

598-629 

• 636-667 
674-705 
712-743 
750-781 
788-819 
826-857 
864-895 
902-933 
936-1064 

23-34,28-43,45-54, 

62-88,180-191, 

185-200,202-211 

218-229,223-238 

240-249,256-267 

261-276,278-287 

294-305,299-314 

316-325,332-343 

337-352,354-363 

370-381,375-390 

392-401,408-419, 

413-428,430-439, 

446-457,451-466, 

468-477,484-495 

489-504,506-515, 

522-533,527-542, 

544-553,560-571, 

565-580,582-591, 

598-609,603-618, 

• 620-629,636-647, 
641-656,658-667, 
674-685,679-694, 
696-705,712-723, 
717-732,734-743, 
■ ■ 750-761,755-770, 
772-781,788-799, 
793-808,810-819, 
826-837,831-846, 
848-857,864-875, 
869-884,886-895, 
902-913,907-922, 
924-933 
SUMMARY 



tdomain signal sequence fstatus predicted tlabel SIG\ 

tproduct fibropellin I fstatus predicted t label FIB\ 

tdomain EGF homology tlabel EG01\ 

tdomain Clr/Cls repeat homology flabel CSR\ 

tdomain EGF homology tlabel EG02\ 

tdomain EGF homology tlabel EG03\ 

tdomain EGF homology tlabel EG04\ 

tdomain EGF homology tlabel EG05\ 

tdomain EGF homology tlabel EG06\ 

tdomain EGF homology tlabel EG07\ 

tdomain EGF homology tlabel EG08\ ' 

tdomain EGF homology tlabel EG09\ ■ 

tdomain EGF homology tlabel EG10\ 

tdomain EGF homology tlabel EG11\ 

tdomain EGF homology tlabel EG12\ 

tdomain EGF homology tlabel EG13\ 

tdomain EGF homology tlabel EG14\ 

tdomain EGF homology tlabel EG15\ 

tdomain EGF homology tlabel EG16\ 

tdomain EGF homology tlabel EG17\ 

tdomain EGF homology tlabel EG18\ 

tdomain EGF homology tlabel EG19\ 

tdomain EGF homology tlabel EG20\ 

tdomain EGF homology tlabel EG21\ 

tregion avidin-like\ 



fdisulfide_bonds tstatus predicted\ 



tdisulfidejonds tstatus predicted 
tlength 1064 tmolecular-weight 112072 tchecksum 303 



Query Match 6.5%; 
Best Local Similarity 44.5%; 
Matches 106; Conservative 



Score 735; DB 2; Length 1064; 

Pred. No. 1.20e-105; 

38; Mismatches 81; Indels 13; 



Db 



254 NEC ASS PCLNGG IC - VDG VNMFECTCLAGFTGVRCEVNI DEC ASAPCQNGG IC - 1 -DG I - 309 
I I hll III I |: : III IN Ml | | || :|| I : :| 
Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCRHGGTCHLKEGEE 975 

Db 310 NGYTCSCPLGFSGDNCENNDDECSSIPCLNGGTCTOLVNAYMCVCAPGWTGPTCADNIDE 369 

:|: I I: II hill I hi II :|||| :| i |:|:| I I :::| 
Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 370 CA-S-APCQNGGVCIDGVNGYMCDCQPGYTGTHCETDIDECARPPCQNGGDCVDGVNGYV 427 
M llh : II :|::||| III I lh hhl I lh I hllll 



Qy 1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 

Db 428 CICAPGFDGLNCE-NN--I — DECASRPCQNGAVCVDGVNGFVCTCSAGYTGVLCE 478 

llh hllll; : | : Mill |: :| :| I :|| | 
Oy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 



RESULT 5 
ENTRY 
TITLE 
ORGANISM 
DATE 



S78549 ttype complete 
notch3 protein - human 
f formal .name Homo sapiens tcommonjiame man 
24-Jul-1998 tsequence_revision 24-Jul-1998 ttext_change 
17-Mar-1999 
ACCESSIONS S78549; S71825 
REFERENCE S78549 

tauthors Joutel, A,; Tournier-Lasserve, E. 
tsubmission submitted to the EMBL Data Library, April 1997 
taccession S78549 
ttmolecule.type mRNA 
ttresidues 1-2321 ttlabel JOOl 
ttcross -references EMBL:097669; NID:g2668591; PID:g2668592 
S71825 

Joutel, A,; Corpechot, C; Ducros, A.; Vahedi, K. ; Chabriat, 
H.; Mouton, P.; Alamowitch, S.; Domenga, V.; Cecillion, M.; 
Marechal, E. ; Maciazek, J.; Vayssiere, C; Cruaud, C; 
Cabanis, E.A.; Ruchoux, M.M.; Weissenbach, J,; Bach, J.F.; 
Bousser, M.G.; Tournier-Lasserve, E. 
Nature (1996) 383:707-710 

Notch3 mutations in CADASIL, a hereditary adult-onset 
condition causing stroke and dementia, 
tcross-references MUID:97032728 
taccession S71825 

t tstatus nucleic acid sequence not shown 
ttmolecule.type DNA 

ttresidues 67-113;138-194;268-333 f ' G ' , 335 - 34 6 ; 536- 613 ; 716 -765; 

1240-1279;1815-1888 ttlabel JOU2 
ttcross -references EMBL:U97669 
GENETICS 
' tgene notch3 
tmap_position 19pl3.1 
FUNCTION 

tdescription may be involved in pathogenesis of CADASIL, causing a type of 
stroke and dementia 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 
tandem repeat; transmembrane protein 



tauthors 



t journal 
•title 



KEYWORDS 

FEATURE 
318-349 
1838-1870 
1871-1903 
1905-1937 
1938-1970 
1971-2003 

SUMMARY 



tdomain EGF homology tlabel EGF\ 

tdomain ankyrin repeat homology tlabel ANl\ 

tdomain ankyrin repeat homology tlabel AN2\ 

tdomain ankyrin repeat homology tlabel AN3\ 

tdomain ankyrin repeat homology tlabel AN4\ 

tdomain ankyrin repeat homology tlabel AN5 

tlength 2321 tmolecular-weight 243657 tchecksum 3337 



Query Match 6.5%; Score 734; DB 2; Length 2321; 

Best Local Similarity 31.5%; Pred. No. 1.85e-105; 



I :::lh: III : I I II II h I hi :| ::||::||||: h 



: I I lllhlllllllll : I I :Mllhl I I III! Ih:| I 



I : hh: : I: I I I I I II ::||| I :|| | | | : 



Matches 


Db 


160 


Qy 


916 


Db 


218 


Qy 


975 


Db 


277 


Qy 


1035 


Db 


337 



Tue Jun 1 10:16:11 1999 



US-09-191-647-2.rpr 



Page 6 



Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPC-DNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 



II I 1 : 1 1 I I : I I 



Qy 


1095 


Db 


390 


Qy 




Db 


446 


Qy 


1214 


Db 


498 


Qy 


1274 


Db 


553 


Qy 


1334 


Db 


608 


i 


1393 


RESULT 


ENTRY 


TITLE 



I : I I! II 



I I II I: : I : I : I III |: : I I I :|:ll |: lh I 
1393 CLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPY-CECSSGYTGDSCDREIS-C 1449 



6 

A48836 ttype complete 

fibropellin C precursor • sea urchin (Strongylocentrotus 
, purpuratus) 
ALTERNATEJAMES EGF repeat-containing protein; epidermal growth 

factor-related protein 3; fibropellin III 
ORGANISM tformaljiame Strongylocentrotus purpuratus *conunon_name 

purple urchin 

DATE Ol-Dec-1993 Ssequence.revision 18-Nov-1994 ttext_change 

07-Aug-1998 
ACCESSIONS A48836 
REFERENCE A48836 

tauthors Bisgrove, B.W.; Raff, r.a. 

tjournal Dev. Biol. (1993) 157:526-538 

ttitle The SpEGF III gene encodes a member of the fibropellins: EGF 
repeat-containing proteins that form the apical lamina of 
the sea urchin embryo, 
tcross -references MUID: 93273088 
♦accession A48836 

ffstatus preliminary 

ttmolecule_type mRNA 

ttresidues 1-570 tflabel BIS 

ticross-references GB:L07045; NID:g310659; PID:g310660 

ttnote sequence extracted from NCBI backbone (NCBIN: 132724, 

NCBIP;132725) 

CLASSIFICATION tsuper family Clr/Cls repeat homology; EGF homology 



|TURE 
■ 1-18 
r 19-570 

19-54 

57-175 

176-211 

214-249 

252-287 

290-325 

328-363 

366-401 

404-439 

442-570 

23-34,28-43,45-54, 

62-88,180-191, 

185-200,202-211, 

218-229,223-238, 

240-249,256-267, 

261-276,278-287, 

294-305,299-314, 

316-325,332-343, 

337-352,354-363, 

370-381,375-390, 

392-401,408-419, 



tdomain signal sequence fstatus predicted tlabel SIG\ 

tproduct fibropellin C tstatus predicted I label FIB\ 

tdomain EGF homology tlabel EGF1\ 

tdomain Clr/Cls repeat homology tlabel ClR\ 

tdomain EGF homology tlabel EGF2\ 

tdomain EGF homology tlabel EGF3\ 

tdomain EGF homology tlabel EGF4\ 

tdomain EGF homology tlabel EGF5\ 

tdomain EGF homology tlabel EGF6\ 

tdomain EGF homology tlabel EGF7\ 

tdomain EGF homology tlabel EGF8\ 

tregion avidin-like\ 



413-428,430-439 tdisulfide_bonds tstatus predicted 
SUMMARY tlength 570 tmolecular-weight 61115 tchecksum 5567 

Query Match 6.4%; Score 726; DB 2; Length 570; 

Best Local Similarity 40, 31; Pred. No. 5.71e-104; 

36; Conservative 54; Mismatches 75; Indels 13; Gaps 



: I :MI I :II I I: hi |: II |::|: I I I ||::||:| : : 



:|: I I 1 1 : 1 III h::| I I : llllll ::| I I Ml III 



Matches 


Db 


178 


Qy 


916 


Db 


234 


Qy 


976 


Db 


294 


Qy 


1036 


Db 


352 


Qy 


1096 



II :|lh : I 



: III :h I :|: ::::| : I II: I hhll 



I I 1:1 1:1 I :| :: MM: | 



RESULT 7 

ENTRY S45306 ttype complete 

TITLE notch 3 protein - mouse 

ORGANISM tformaljiame Mus musculus tcommon.name house mouse 

DATE 20-Feb-1995 tsequence_revision 20-Feb-1995 ttext change 

lO-Jul-1998 
ACCESSIONS S45306 
REFERENCE S45306 

tauthors Lardelli, M.; Dahlstrand, J.; Lendahl, U. 
tjournal Mech, Dev. (1994) 46:123-136 
ttitle The novel Notch homologue mouse Notch 3 lacks specific 
epidermal growth factor -repeats and is expressed in 
proliferating neuroepithelium. 
tcross -references MUID: 95001556 
taccession S45306 

ttstatus preliminary 
ttmolecule type mRNA 
ttresidues 1-2318 ttlabel LAR 
ttcross-references EMBL;X7475Q; NID;g483580; PID;g483581 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1839-1871 tdomain ankyrin repeat homology tlabel AN1\ 

1872-1904 tdomain ankyrin repeat homology tlabel AN2\ 

1906-1938 tdomain ankyrin repeat homology tlabel AN3\ 

1939-1971 tdomain ankyrin repeat homology tlabel AN4\ 

1972-2004 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2318 tmolecular-weight 244245 tchecksum 9358 



Query Match 



6.4%; Score 719; DB 2; Length 2318; 



Best Local Similarity 31.2*; Pred. No. 1.15e-102; 



I I M: |||:| | || || |: | |: |: :| ::||::|||| 



: I I MMMIIMIMI : I I Mlllhl I I III! IMM I M 



I I I I II I :MM Mill 



Matches 


Db 


161 


Qy 


916 


Db 


218 


Qy 


975 


Db 


278 


Qy 


1035 


Db 


338 


Qy 


1095 
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Db 


391 


Qy 




Db 


447 


Qy 




Db 


499 


Qy 


1274 


Db 


554 


Qy 


1334 




609 


i 


1393 



I I 



I I 



II I Ml 



I II I 



I I II h : I : I : I I II 



I : I III 



I I I :|:M I: II: I 



RESULT 8 

ENTRY A35844 ttype complete 

TITLE Xotch protein ■ African clawed frog 

ORGANISM tformaljiame Xenopus laevis tcommonjame African clawed frog 

date 12-Oct-1990 fsequence.revision 12-Oct-1990 ttext change 

14-Aug-1998 
ACCESSIONS A35844 
REFERENCE A35844 

♦authors Coffman, C; Harris, W.; Kintner, C. 

♦journal Science (1990) 249:1438-1441 

♦title Xotch, the Xenopus homolog of Drosophila notch. 

♦cross-references MUID: 90385285 

♦accession A35844 

♦♦status preliminary; nucleic acid sequence not shown; not 

compared with conceptual translation 
♦tmolecule.type mRNA 
♦♦residues 1-2524 ♦♦label COF 
CLASSIFICATION ♦superfamily unassigned ankyrin repeat proteins; ankyrin 

repeat homology; EGF homology 
KEYWORDS transmembrane protein 



♦domain EGF homology tlabel EGF\ 

♦domain ankyrin repeat homology tlabel AN1\ 

♦domain ankyrin repeat homology tlabel AN2\ 

♦domain ankyrin repeat homology tlabel AN3\ 

♦domain ankyrin repeat homology tlabel AN4\ 

♦domain ankyrin repeat homology tlabel AN5 
♦length 2524 faolecular -weight 274931 tchecksum 9441 



222-254 
1924-1956 

•1957-1989 
1991-2023 
2024-2056 
2057-2089 
SUMMARY 

Query Match 



6.3%; Score 714; 
Best Local Similarity 31. It; Pred. No. ! 



DB 2; Length 2524; 
,77e-102; 



Matches 


Db 


182 


Qy 


916 


Db 


239 


Qy 


976 


Db 


299 


Qy 


1036 


Db 


359 


Qy 


1096 


Db 


412 


Qy 


' 1155 



i i inn i i 



lllll I I::|| I :| ::|| :IMI 



:IMI hill 1:1 I I I III: II: I I :| 



I I l::l I I III ::IM : Ml' I I 



i ii i mm 



m : m 



I ::| 



I m I III I |:: 



Db 


464 


Qy 


1215 


Db 


518 


Qy 


1275 


Db 


573 


Qy 


1333 


Db 


628 


Qy 


1392 


Db 


683 


Qy 


1450 



i mi: III 



I III : I III I: I I III I: II:: |::| I 



Mill: 1:1: :||l : : :| II :|lll h :|: I 



RESULT 9 

ENTRY S18188 ttype complete 

TITLE ■ notch protein homolog - rat 

ORGANISM tformaljiame Rattus norvegicus tconunonjiame Norway rat 

DATE 19-Feb-1994 tsequence.revision 10-Nov-1995 ♦text change 

12-Feb-1999 
ACCESSIONS S18188 
S18188 

♦authors Weinmaster, G.; Roberts, V.J.; Lemke, G. 
♦journal Development (1991) 113:199-205 
♦title A homolog of Drosophila Notch expressed during mammalian 



♦cross-references MUID:92111383 
♦accession S18188 
♦tmolecule.type mRNA 
♦♦residues 1-2531 ttlabel WEI 
♦♦cross-references EMBL:X57405; NID:g57634; PlD:g57635 
CLASSIFICATION ♦superfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1917-1949 tdomain ankyrin repeat homology tlabel ANl\ 

1950-1982 tdomain ankyrin repeat homology tlabel AN2\ 

1984-2016 tdomain ankyrin repeat homology tlabel AN3\ 

2017-2049 tdomain ankyrin repeat homology tlabel AN4\ 

2050-2082 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY t length 2531 tmolecular-weight 270907 tchecksum 2705 

Query Match 6.3*; Score 708; DB 2; Length 2531; 

Best Local Similarity 29.11; Pred. No. 1.27e-100; 



ii im m 



in mi im m : i m in im i n ::i m ; n 

CICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQD 1039 



II : :N: I: : I II III ll::|:|:|: :| I I |:| :|hl 



Matches 


Db 


420 


Qy 


920 


Db 


476 


Qy 


980 


Db 


535 


Qy 


1040 


Db 


593 


Qy 


1100 


Db 


646 


Qy 


1157 



I : l::|: I II :IMI I :|| :| 



II :| 
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Db 


706 


Qy 


1213 


Db 


764 


Qy 


1272 


Db 


822 


Qy 
Db 


1323 
881 


Qy 


1379 


Db 


936 


Qy 


1437 



I I I I :| 



I::: I I :| : ::| 



II I : I I I : :|:l I II I |: : I; : I 



GASCQNTNG - S YRCLCQAGYTGRNCESDID - • D ■ CRPNPCHNGGSCT - DGVNAAFCDCLP 935 

I: I I: II I I I I I: : I : I:: I :| I |: ::|:| : 



I: I I: :h I 



IfLT 10 

A24420 itype complete 
RLE notch protein - fruit fly (Drosophila melanogaster) 

ALTERNATE_NAMES neurogenic repetitive locus protein 
ORGANISM tformal_name Drosophila melanogaster 

date 30-Jun-1987 tsequence revision 30-Jun-1987 itext change 

07-Aug-1998 
A24420; A24768; S09358; A05267 
A24420 

♦authors Kidd, S,; Kelley, M,R,; Young, M.W. 
tjournal Mol. Cell. Biol. (1986) 6:3094-3108 
tcross -references MUID: 87064624 
taccession A24420 

##molecule_type DNA 

ttresidues 1-2703 ttlabel KID 

f tcross -references GB:K03508; NID:gl57991; PID:gl57993 
REFERENCE A24768 

tauthors Wharton, K.A.; Johansen, K.M.; Xu, T.; Artavanis-Tsakonas, S. 
tjournal Cell (1985) 43:567-581 
tcross-references MOID: 86079539 
♦accession A24768 

ttmolecule_type mRNA 

ttresidues 1-48, T ,50-118, 'R' ,120-230, 'I' ,232-256, W ,258-266, 'A', 
268-872, ' R' , 874-958, 'R' ,960-1970, 'FH' , 1973-2256, 'G' , 
2258-2264, 'V , 2266-2406, 'R' ,2408-2444, 'L' ,2446-2703 
ttlabel WHA1 

ttnote the authors translated the codon ATC for residue 49 as 

Thr, ATT for residue 2044 as Arg, GTA for residue 2265 

•I as Ala, CGC for residue 2407 as His, and CTT for 

| residue 2445 as Arg 

ERENCE S09358 
tauthors Tautz, D. 

tjournal Nucleic Acids Res. (1989)17:6463-6471 

ttitle Hypervariability of simple sequences as a general source for 

polymorphic DNA markers . 
tcross-references MOID: 89385974 
taccession S09358 
ttmolecule.type DNA 

ttresidues 2505-2551, 'QQQQ', 2552-2576, 'E', 2578-2604 ttlabel TAD 
REFERENCE A05267 

tauthors Wharton, K.A.; Yedvobnick, B,; Finnerty, V.G.; 

Artavanis-Tsakonas, S. 
tjournal Cell (1985) 40:55-62 

ttitle opa; a novel family of transcribed repeats shared by the 

Notch locus and other developmental^ regulated loci in D. 
melanogaster. 

tcross-references MOID: 85099329 

taccession A05267 
ttmolecule_type DNA 

ttresidues 2504-2576, 'E\ 2578-2611 ttlabel WHA2 
GENETICS 

tgene notch; opa 



ttcross-references FlyBase : FBgn000464 7 
#map_position 8.96-9.36 

tintrons 53/3; 84/3; 171/3; 240/3; 283/3; 2333/3; 2436/3; 2588/3 
CLASSIFICATION tsuper family notch protein; ankyrin repeat homology; EGF 
homology 

KEYWORDS differentiation; tandem repeat; transmembrane protein 

FEATURE 

27-43 tdomain transmembrane tstatus predicted ttlabel TMMl\ 

568-599 tdomain EGF homology tlabel EGF\ 

1746-1762 tdomain transmembrane tstatus predicted tlabel TMM2\ 

1950-1982 tdomain ankyrin repeat homology tlabel AN1\ 

1983-2015 tdomain ankyrin repeat homology tlabel AN2\ 

1988-2004 tdomain transmembrane tstatus predicted tlabel TMM3\ 

2017-2049 tdomain ankyrin repeat homology tlabel AN3\ 

2050-2082 tdomain ankyrin repeat homology tlabel AN4\ 

2083-2115 tdomain ankyrin repeat homology tlabel AN5\ 

2538-2568 tregion glutamine-rich\ 

2538-2568 tdomain neurogenic repetitive element tstatus predicted 

tlabel OPA 

SUMMARY tlength 2703 tmolecular -weight 288876 tchecksum 6404 

Query Match 6.3%; Score 708; DB 2; Length 2703; 

Best Local Similarity 44.8%; Pred. No. 1.27e-100; 



I I hll hhl II I III III :|:: I I III! : III! 



:|l I II II I h : i : ! 1 1 : | | : | |:| |:| ||| ||| || ::: 



Matches 


Db 


490 


Qy 


916 


Db 


546 


Qy 


976 


Db 


606 


Qy 


1036 


Db 


663 


Qy 


1096 


Db 


716 


Qy 


1156 



I I III:: :lll ::lll I III I |: 



:|: II :|| I I :| 



I I I II II 



--V--NVNECHSNPCNNGATCIDGINSYKCQCVPGFTGQHCEKN 715 

I : I : I III II II 111:11: |: III 



RESULT 11 

ENTRY A49128 ttype complete 

TITLE cell-fate determining gene Notch2 protein - rat 

ORGANISM tformaljame Rattus norvegicus icommonjiame Norway rat 

DATE 21-Jan-1994 tsequence revision 18-Nov-1994 ttext change 

14-Aug-1998 
ACCESSIONS A49128 
REFERENCE A49128 

tauthors Weinmaster, G.; Roberts, V.J.; Lemke, G. 
tjournal Development (1992) 116:931-941 
ttitle Notch2: a second mammalian Notch gene, 
tcross-references MUID: 93202015 
taccession A49128 

ttstatus preliminary; not compared with conceptual translation 

ttmolecule_type mRNA 

ttresidues 1-2471 ttlabel WEI 

ttexperimental_source Schwann cell 

ttnote sequence extracted from NCBI backbone (NCBIP: 127811) 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

1029-1060 tdomain EGF homology tlabel EGF\ 

1876-1908 tdomain ankyrin repeat homology tlabel AN1\ 

1909-1941 tdomain ankyrin repeat homology tlabel AN2\ 

1943-1975 tdomain ankyrin repeat homology tlabel AN3\ 

1976-2008 tdomain ankyrin repeat homology tlabel AN4\ 

2009-2041 tdomain ankyrin repeat homology tlabel AN5 
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SUMMARY 



tlength 2471 tmolecular -weight 265367 tchecksum 5929 



Query Match 6.2%; Score 705; DB 2; Length 2471; 

Best Local Similarity 29.5%; Pred, No. 4.60e-100; 

Matches 162; Conservative 118; Mismatches 220; Indels 49; Gaps 38; 

Db 184 NECDIPGRCQHGGTCLNLPGS ■ YRCQCPQRFTGQHCDSPYVPCAPSPCVNGGTCR-QTGD 241 

II:: I : III : I III II I II II I :| ::|| : 1 1 1 1 : |: 
Qy 916 NPC-LSNPCKNDGTCNSDPTOFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGE 974 

Db 242 FTSE-CHCLPGFEGSNCERNIDDCPNHKCQNGGVCVDGVNTYNCRCPPQWTGQFCTEDVD 300 

: I I Nil III hill :: hi : MM: | | MM ||::| | M 
0y 975 EDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLD 1034 

Db 301 ECLLQPNACQNGGTCTNRNGGYGCVCVNGWSGDDCSENIDDCAFASCTPGSTCIDRVASF 360 
I : MM : | |: | | | |: | :MM I Mill:: 

•1035 FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 
361 SCLCPEGKAGLLCH L--DDACISNPCHKGALCDTNPLNGQYICTCPQAYKGADC 412 

MMM : 1 1 : 1 | M : MM I M II I M I 

Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQC-IVRINEP-ICQCLPGYQGEKC 1152 

Db 413 TEDVDECAMANSNPC-E-HAGKC-VNTDGAFHCEC-LK-G— YAGPRCEMDINECHSDP 464 

I : I :; : ::| |: ::: III: ■: :: :; 
Qy 1153 -EKLVSVNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRV 1211 

Db 465 CQN- DATCLDKIGGFTCLCMP - G - FKGVHCELEYNECQSNPC - VNNGQCVDKVNRFQCLC 520 

: I: : :: ; | | | | ::: | : | ; | 
Qy 1212 RASYDTGSHPAS AI YSVET I NDGNFHIVE - LLALDQSLSLSVDGGNPKI ITNLSKQSTLN 1270 

Db 521 --PPGFTGPVC-QIDIDDCSSTPCLNGAKCIDH-PNGY-ECQCATGF-TGTLCDENIDNC 574 

:| : I : :: :| ||: I I : : I ; : | 

Qy 1271 FDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSE-LQDFQKVPMQTGILPGC 1329 

Db 575 DP-DP- -CHHGQCQDGIDS-YTCICNPGYMGAICSDQIDE-CYSSPCLNDGRCIDLVH-G 628 

:| I II II : :: Ml I I ll::| :: :: I :: |:: I M : : 
Qy 1330 EPCHKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVH-GTCLPINAFS 1388 

Db 629 YQCNCQPGTSGLNC--EIN-FDDCASNPCLHGAC-VDGINR-YSCVCSPGFTGQRCNIDI 683 

I hi I :h I I : I: I : I II I : h : I I Ihhlh h :| 
Qy 1389 YSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPY-CECSSGYTGDSCDREI 1447 



I 

rose 



684 DECASNPCR 692 

I :: I 
1448 S-CRGERIR 1455 



iSULT 12 

ENTRY A46019 I type complete 

TITLE gene Notch-1 protein • mouse 

ORGANISM t formal jiame Mus musculus tcommonjiame house mouse 

DATE 22-Sep-1993 tsequence_revision 18-Nov-1994 ttext change 

14-Aug-1998 
ACCESSIONS A46019 
REFERENCE A46019 

♦authors del Amo, F.F.; Gendron-Maguire, M.; Swiatek, P.J.; Jenkins, 
K.A.; Copeland, N.G.; Gridley, T. 

♦journal Genomics (1993) 15:259-264 

♦title Cloning, analysis, and chromosomal localization of, Notch-1, 

mouse homolog of Drosophila Notch, 
(cross-references MUID: 93194170 
♦accession A46019 

♦♦status preliminary; not compared with conceptual translation 

♦tmolecule.type nucleic acid 

♦♦residues 1-2531 Mabel DEL 

♦♦cross -references GB:ZU886; GB:S47228; NID:g288502; PID:g288503 
♦♦note sequence extracted from NCBI backbone (NCBIP:127318) 

CLASSIFICATION fsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

757-788 tdomain EGF homology flabel EGF\ 

1917-1948 tdomain ankyrin repeat homology #label AN1\ 

1949-1981 tdomain ankyrin repeat homology flabel AN2\ 



1983-2015 tdomain ankyrin repeat homology tlabel AN3\ 

. 2016-2048 tdomain ankyrin repeat homology Ulabel AN4\ 

2049-2081 tdomain ankyrin repeat homology Itlabel AN5 

SUMMARY tlength 2531 tmolecular -weight 271312 tchecksum 6611 

Query Match 6.2%; Score 704; DB 2; Length 2531; 

Best Local Similarity 28.5%; Pred. No. 7.05e-100; 

Matches 158; Conservative 118; Mismatches 230; Indels 48; Gaps 38; 

Db 420 ANRCEHAGKCL-NTLGSFECQCLQGYTGPGCEIDVNECISNPCQNDATC-L-D-QIGEFQ 475 

:| I : I I : : : I I hi h: :: MM : :|| I : : I 
Qy 920 SNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFW 979 

Db 476 CICMPGYEGVYCEINTDECASSPCLHNGHCMDKIHEFQCQCPKGFNGHLCQYDVDECA-S 534 

III M 1 1 : 1 hi : I M: |:| ::: | || ; | \: :\ || 
Qy 980 CICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 535 -TPCKNGAKCLDGPNTYTCVCTEGYTGTHCEVDIDECDPDPCHYGS-CKDGVATFTCLCQ 592 

II : MM |: : I II II I IM:MI:|: : | |: | M MM 
Qy 1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

Db 593 PGYTGHHCE-TN--I-— NECHSQPCRHGGTCQDRDNSYLCLCLKGTTGPNCEINLDDC 645 

Ihl II : : : I : MM: | | | Mill i Ml M 
Qy 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQC IVRI NEP ICQCLPG YQGEKCE - KL - - V 1156 

Db 646 ASNPCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPCHNGGTCEDGIAGFTCRCP 705 

: I : : I: I : : I : : II :| :| :: : I 
Qy 1157 SVNFINKESYLQ-IPSAKVRPQTNIT-L-QIATDEDSGILLYKGDKDHIAVELYRGRVR 1212 

Db 706 EGYHDPTCLSEVNECNSNPCIHGACRD-GLNGYKCDCAPGWSGTNCDINNNEC-ESNPCV 763 

:| I : : : I : I : : : I I I I :| 
Qy 1213 ASY-DTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSRQSTLNF 1271 

Db 764 NGGTCKD-MTS-GYVCTCREGFSGPNCQTNINECASNPCLNQGTCIDDVAGYKCNCPLPY 821 

:: I : : I : h: :| I |::: | | :| : ::| : : |: 
Qy 1272 DSPLYVGGMPGKSNVASLRQA-PGQN-GTSFHGCIRNLYIN-SE-LQD--FQ-KVPMQ- 1322 

Db 822 TGATCEWLAPCATSPCKNSGVCKESEDYESFSCVCPTGWQGQTCEVDINE-CVKSPCRH 880 

II II I : I I I : :hl Mill: h h : I I 

Qy 1323 TGILPGC--EPCHKKVCAH-GTCQPSSQ-AGFTCECQEGWMGPLCDQRTNDPCLGNKCVH 1378 

Db 881 GASCQNTNG-SYRCLCQAGYTGRNCESDID--D-CRPNPCHNGGSCT-DGINTAFCDCLP 935 

hi h II I I II I: : I : h: M I h ::|:| : 
Qy 1379 GT-CLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHG-KCRLSGLGQPYCECSS 1436 

Db 936 GFQG AFCEED I NEC 949 

h I h :h I 
Qy 1437 GYTGDSCDREIS-C 1449 



RESULT 13 

ENTRY 150719 ttype complete 

TITLE C-Delta-1 - chicken 

ORGANISM ♦formaljame Gallus gallus ♦commonjiame chicken 

DATE 13-Sep-1996 tsequence_revision 13-Sep-1996 ttext change 

14-Aug-1998 
ACCESSIONS 150719 
REFERENCE 150719 

tauthors Henrique, D.; Adam, J.; Myat, A.; Chitnis, A.; Lewis, J.; 
Ish-Horowicz, D. 

t journal Nature (1995) 375:787-790 

♦title Expression of a Delta homologue in prospective neurons in the 
chick. 

♦cross-references MUID: 95319507 
♦accession 150719 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
♦tmolecule_type mRNA 
ttresidues 1-728 ttlabel HEN 
♦♦cross-references EMBL:U26590; NID:g882411; PID:g882412- 
CLASSIFICATION tsuperfamily EGF homology 
FEATURE 

454-485 tdomain EGF homology tlabel EGF 

SUMMARY tlength 728 tmolecular -weight 79861 tchecksum 1765 
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Query Match 5.9%; Score 666; DB 2; Length 728; 

Best Local Similarity 39,5%; Pred. No. 7.77e-93; 

Matches 94; Conservative 50; Mismatches 81; Indels 13; Gaps 9; 

Db 303 KPCKNGATCTNTGQGSYTCSCRPGYTGSSCEIEINECDANPCKNGGSC--TD-LENSYSC 359 

:IMI :ll : I M I: ! |:: |: I :||||:||:! : [::: I 
Qy 921 NPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWC 980 

Db 360 TCPPGFYGKNCELSAMTCADGPCFNGGRCTDNPDGGYSCRCPLGYSGFNCEKKIDYCS-S 418 

: II I III:: II II : I I : |:| || |:| II |:|:|: 
Qy 981 ICADGFEGENCEVNVDDCEDNDCENNSTCVDGINN-YTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 419 -SPCANGAQCVDLGNSYICQCQAGFTGRHCDDNVDDCASFPCVNGGTCQDGVNDYSCTCP 477 

:H : : h ::: |:| :|: I III : III | ||: | |:|| |:| || 
Qy 1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

Db 478 PGYNGKNC- -STP-V-SR- - -CEHNPCHNGATCHERSNRYVCECARGYGGLNCQFLLP 528 

11:1 MM: |:: Mi I I I :|:| I | :|: I:: 
Qy 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEP1CQCLPGYQGEKCEKLVS 1157 

tilt 14 
RY S42612 ttype complete 

TITLE transmembrane protein precursor • zebra fish 

ORGANISM tformaljiame Brachydanio rerlo #comraon_name zebra fish 

DATE 20-Feb-1995 tsequencejrevision 20-Feb-1995 ftext change 

10-Jul-1998 
ACCESSIONS S42612 
REFERENCE S42612 

iauthors Bierkamp, C; Campos -Ortega, J. A. 
ijournal Mech. Dev. (1993) 43:87-100 

ititle A zebrafish homologue of the Drosophila neurogenic gene Notch 
and its pattern of transcription during early 
embryogenesis. 
tcross -references MOID:94128602 
taccession S42612 

ttstatus preliminary 
ffmolecule.type mRNA 
ffresidues 1-2437 ftlabel BIE 
ttcross -references EMBL:X69088; NID:g433866; PID;g433867 
CLASSIFICATION tsuper family unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1915-1947 tdomain ankyrin repeat homology tlabel AN1\ 

1948-1980 tdomain ankyrin repeat homology tlabel AN2\ 

1982-2014 idomain ankyrin repeat homology ilabel AN3\ 

2015-2047 tdomain ankyrin repeat homology tlabel AN4\ 

- 2048-2080 idomain ankyrin repeat homology tlabel AN5 

SARY tlength 2437 tmolecular-weight 262306 tchecksum 4021 

Tuery Match 5.9%; Score 667; DB 2; Length 2437; 

Best Local Similarity 39.3%; Pred, No. 5,08e-93; 
Matches 94; Conservative 53; Mismatches 79; Indels 13; Gaps 9; • 

Db 791 NECASNPCLNQGSCIDDVAGF-KCNCMLPYTGEVCENVLAPCSPRPCKNGGVC--RESED 847 

I I Mil 1:1:1 I I :| I : |: |: : :| : |||:|| | :|:|: 

Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 848 FQSFSCNCPAGWQGQTCEVDINECVRNPCTNGGVCENLRGGFQCRCNPGFTGALCENDID 907 

"I I I: I :h I I I : I : : I I I :|| III: :| 

Qy 976 -DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLD 1034 

Db 908 DC • • EPNPCSNGGVCQDRVNGFVCVCLAGFRGERCAEDIDECVSAPCRNGGNCTDCVNSY 965 

I : III : : I :|| I I :|: ll:| |:|:| |:||::||| ||:| 
Qy 1035 FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 

Db 966 TCSCPAGFSGINCEINTPDC-TESS-C-FN--GGT-CVDGISSFSCVCLPGFTGNYCQ 1017 

II II 1:11: lh: I :| I |: I: |: |: | ||||: |: |: 

Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 



ENTRY A40043 ttype complete 

TITLE notch protein homolog TAN-1 precursor - human 

ORGANISM tformaljiame Homo sapiens tcommonjiame man 

DATE 21-Apr-1992 tsequencejrevision 21-Apr-1992 ttext change 

14-Aug-1998 
ACCESSIONS A40043 
A40043 

tauthors Ellisen, L.W.; Bird, J.; West, D,C; Soreng, A.L.; Reynolds, 

T.C.; Smith, S.D.; Sklar, J, 
Ijournal Cell (1991) 66:649-661 

Ititle TAN-1, the human homolog of the Drosophila Notch gene, is 
broken by chromosomal translocations in T lymphoblastic 



icross-references MTJID : 91347367 
taccession A40043 

ttstatus preliminary; nucleic acid sequence not shown; not 

compared with conceptual translation 
ttmolecule.type mRNA 
tfresidues 1-2555 ttlabel ELL 
ttcross -references GB:M73980 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

1149-1180 idomain EGF homology tlabel EGF\ 

1927-1959 tdomain ankyrin repeat homology tlabel AN1\ 

1960-1992 tdomain ankyrin repeat homology tlabel AN2\ 

1994-2026 tdomain ankyrin repeat homology tlabel AN3\ 

2027-2059 idomain ankyrin repeat homology tlabel AN4\ 

2060-2092 idomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2555 tmolecular-weight 272337 tchecksum 463 

Query Match 5.9%; Score 667; DB 2; Length 2555; 

Best Local Similarity 30.7%; Pred. No. 5.08e-93; 

Matches 169; Conservative 106; Mismatches 225; Indels 51; Gaps 45; 

Db 302 MPNACQNGGTCKNTHGG - YNCVCVNGWTGEDCSENIDDCASAACFHGATCH - -D - RVASF 357 

::|:| I III:: I I I I |:|| MM ||:||| : :| 
Qy 919 LSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGF 978 

Db 358 YCECPHGRTGLLCHLN-DACISNPCNEGSNCDTNPVNGKAICTCPSGYTGPACSQDVDEC 416 

I I: I I Ml I |:: I I M : MM I : :|| 

Qy 979 WCICADGFEGENCEVNVDDCEDNDCENNSTC-VDGIN-NYTCLCPPEYTGELCEEKLDFC 1036 

' Db 417 SLGANPCEHAGRCINTLGSFECQCLQGYTGPRCEIDVNECVSNPCQNDATCLDQIGEFQC 476 
: HI: :||| I :| |:| I I :|:|| MINIM:: 
Qy 1037 AQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTC 1096 

Db 477 MCMPGYEGVHCEVNTDEC-A-SSPC--L--HNG-RCLDKINEFQCECPTGFTGHLCQDVD 529 

: II I: II : : :||| : :|| :|: :||| |:| |: I M 
Qy 1097 ICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKLV 1156 

Db 530 ECASTPCRNGAKCLDGPNTYTCVCTEGYTGTHCEVDIDECDPDPCHYGSCKDGVATFTCL 589 

I: : I: I:: I : I :: II h | ; || :| I 
Qy. 1157 SVNFIN-KE-S-YLQIPSA-K-VRPQTNI-TL-QIATDE-DSGILLYKGDKDHIAV-E-L 1206 

Db 590 CRPGYTGHHCETNINECSSQPCRLRGTCQDPDNAYLCFCLKGTTGPNCEINLDDCASSPC 649 

II :| : :| MM: I : : :::| :::| 

Qy 1207 YR-GRVRASYDTGSHP-ASAIYSV-ETIND-GN-FHIVELL-ALDQSLSLSVD--GGNP- 1257 

Db 650 DSGTCLDKIDGYECACEPGYTGSMC-NSNIDECAGNPCHNGGTCEDGINGFTCRCP-EGY 707 

III : I I 1:1 :ll: M : : 

Qy 1258 KIITNLSKQSTLNFDS-PLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDF 1316 

Db 708 HD-P--T-CLSEVNECNSNPCVHGACR-DSLNGYKCDCDPGWSGTNCDINNNE-CESNPC 761 

: I I I: : I: : I l|:|: I |: |:|: || I ||' |: I :| I 
Qy 1317 QKVPMQTGILPGCEPCHKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKC 1376 

Db 762 VNGGTCKDMTS -GIVCTCREGFSGPNC - -QTNI - NECASNPCLNKGTC - IDDVA -GYKCN 815 

1:1 II : : : I Ml I : :: I I : I : I I : :: I |: 
Qy 1377 VHG-TCLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQAIRC-KHGKCRLSGLGQPY-CE 1433 

Db 816 CLLPYTGATCE 826 
MM: 
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Qy 1434 CSSGJTGDSCD 1444 

Search completed: Fri May 28 08:28:42 1999 
Job time : 177 sees. 
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Release 3 . 1A John F. Collins, Biocomputing Research Unit, 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

irch_pp protein • protein database search, using Smith-Waterman algorithm 



Fri May 28 08:29:00 1999; MasPar time 42.51 Seconds 
1014.108 Million cell 

Tabular output not generated. 



Title: 

Description: 
Perfect Score: 
Sequence; 

Scoring table: 
Searched: 



MJS-09-191-647-2 

(1-1525) from US09191647 . pep 

11299 

1 MRGVGWQMLSLSLGLVLAIL SSFVDEVEKWKCGCTRCVS 1525 

PAM 150 

Gap 11 

77977 seqs, 28268293 residues 



Post-processing: Minimum Hatch 0% 

Listing first 45 summaries 



swiss-prot37 
Lswissprot 



Statistics: 



Mean 56.7 



Variance 101.314; scale 0,560 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
derived by analysis of the total score distribution. 



and 



Query 



SUMMARIES 



NO. 


Score Match Length D 


3 ID 


Description 


Pred. No. 


1 


4210 


37.3 


1480 


SLITJDROME 


SLIT PROTEIN PRECURSOR 


0.00e+00 


2 


735 


6.5 


1064 


FBP1JTRPU 


FIBROPELLIN I PRECURSO 


l,25e 


127 


3 


726 


6.4 


570 


FBP3.STRPU 


FIBROPELLIN C PRECURSO 


1.32e 


125 


4 


719 


6.4 


2318 


NTC3J10USE 


NEUROGENIC LOCUS. NOTCH 


4,93e 


124 


5 


714 


6.3 


2524 


NOTCJENLA 


NEUROGENIC LOCUS NOTCH 


6.53e 


123 


6 


708 


6.3 


2531 


NTCLRAT 


NEUROGENIC LOCUS NOTCH 


l,45e 


121 


7 


708. 


6.3 


2703 


NOTCJROME 


NEUROGENIC LOCUS NOTCH 


l,45e 


121 


8 


704 


6,2 


2531 


NTC1_M0USE 


NEUROGENIC LOCUS NOTCH 


1.14e 


120 


9 


667 


5.9 


2437 


NOTC.BRARE 


NEUROGENIC LOCUS NOTCH 


2.16e 


112 


10 


667 


5.9 


2444 


NTC1JUMAN 


NEUROGENIC LOCUS NOTCH 


2.16e 


112 


11 


646 


5.7 


723 


DLL1JUMAN 


DELTA-LIKE PROTEIN 1 P 


1.03e 


107 


12 


631 


5.6 


714 


DLL1.RAT 


DELTA-LIKE PROTEIN 1 P 


2.22e 


104 


13 


627 


5.5 


722 


DLL1J10USE 


DELTA-LIKE PROTEIN 1 P 


1.71e 


103 


14 


622 


5.5 


1964 


NTC4_M0USE 


NEUROGENIC LOCUS NOTCH 


2,20e 


102 


15 


609 


5.4 


2139 


CRB_DROME 


CRUMBS PROTEIN PRECURS 


1.67e 


99 


16 


554 


4.9 


880 


DL.DROME 


NEUROGENIC LOCUS DELTA 


2,15e 


87 


17 


558 


4.9 


1429 


LI12.CAEEL 


LIN- 12 PROTEIN PRECURS 


2.85e 


88 


18 


494 


4,4 


560 


GPVJUMAN 


PLATELET GLYCOPROTEIN 


2.47e 


74 


19 


490 


4,3 


567 


GPVJOUSE 


PLATELET GLYCOPROTEIN 


l.BOe 


73 


20 


478 


4.2 


1408 


SERR.DROME 


SERRATE PROTEIN PRECUR 


6,98e 


71 


21 


456 


4.0 


567 


GPV.RAT 


PLATELET GLYCOPROTEIN 


3.67e 


66 


22 


452 


4.0 


1295 


GLP1_CAEEL 


GLP-1 PROTEIN PRECURSO 


2.63e 


65 


23 


422 


3.7 


383 


DLKJOMAN 


DELTA-LIKE PROTEIN PRE 


6.31e 


59 



24 


422 


3 7 


1134 


CHA0.DROME 


CHAOPTIN PRECURSOR (PH 


6.31e-59 


25 


417 


3*7 


1959 


AGRI.RAT 


AGRIN PRECURSOR. 


7.20e-58 


26 


405 


36 


385 


DLK_MOUSE 


DELTA-LIKE PROTEIN PRE 


2 . 43e-55 


27 


395 


35 


536 


CBP8 HUMAN 


CARBOXYPEPTIDASE N 83 


3.06e-53 


28 


393 


35 


4393 


PGBM_HUMAN 


BASEMENT MEMBRANE'SPEC 


i.02e-53 


29 


370 


33 


357 


PGS2 CHICK 


BONE PROTEOGLYCAN II P 


4 . 94e-4 8 


30 


375 


33 


360 


PGS2J80VIN 


BONE PROTEOGLYCAN II P 


4.54e-49 


31 


373 


3J 


360 


PGS2lcANFA 


BONE PROTEOGLYCAN II P 


l,18e-48 


32 


377 


3*3 


3707 


PGBMJ40USE 


BASEMENT MEMBRANE-SPEC 


1.74e-49 


33 


366 


3~ 2 


359 


PGS2JUMAN 


BONE PROTEOGLYCAN II P 


3.33e-47 


34 


358 


3^2 


603 


alsJat 


INSULIN-LIKE GROWTH FA 


1.49e-45 


35 


365 


3^2 


1955 


agrlchick 


AGRIN PRECURSOR, 


5,36e-47 


36 


351 


3.1 


354 


PGS2J0USE 


BONE PROTEOGLYCAN II P 


4.10e-44 


37 


346 




603 


ALSJIOUSE 


INSULIN-LIKE GROWTH FA 


4,34e-43 


38 


351 




605 


AISJUMAN 


INSULIN-LIKE GROWTH FA 


4 .10e-44 


39 


342 


3!o 


354 


PGS2JAT 


BONE PROTEOGLYCAN II P 


2,86e-42 


40 


340 


3.0 


361 


CHAD.BOVIN 


CHONDROADHERIN PRECURS 


7.32e-42 


41 


335 


3,0 


368 


PGS1JUMAN 


BONE/CARTILAGE PROTEOG 


7.64e-41 


42 


336 


3,0 


369 


PGS1JAT 


BONE/CARTILAGE PROTEOG 


4,78e-41 


43 


341 


3,0 


605 


ALS PAPPA 


INSULIN-LIKE GROWTH FA 


4.57e-42 


44 


333 


2.9 


369 


PGS1J0USE 


BONE/CARTILAGE PROTEOG 


1.95e-40 


45 


333 


2.9 


682 


CONN.DROME 


CONNECTIN PRECURSOR, 


1.95e-40 



RESULT 
ID 



1 



SLIT.DROME STANDARD; PRT; 1480 AA. 
P24014; 

01-MAR-1992 (REL. 21, CREATED) 
01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 
01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 
SLIT PROTEIN PRECURSOR. 
SLI, 

DROSOPHILA MELANOG ASTER (FRUIT FLY) , 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
[1] 

SEQUENCE FROM N.A, 
MEDLINE; 91099665, 

ROTHBERG J.M., JACOBS J.R., GOODMAN C.S., ARTAVANIS-TSAKONAS S.; 
"Slit: an extracellular protein necessary for development of midline 
glia and commissural axon pathways contains both EGF and LRR 



GENES DEV. 4:2169-2187(1990). 

-!- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

MATRIX MOLECULES, 
-!- TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 

EVENTUALLY DISTRIBUTED ALONG THE AXONS. 
-I- ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

BY 11 AA AT THE C-TERMINUS OF THE LAST EGF REPEAT. 
-!- SIMILARITY: CONTAINS 7 EGF "LIKE DOMAINS. 
■!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

MANY PROTEINS. NUMBER IN THIS PROTEIN: 22, TWO BLOCK OF 6 LRR'S 

AND TWO BLOCKS OF 5 LRR'S. 
-!- SIMILARITY: CONTAINS A C'TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK) . 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation • 
the European Bioinformatics institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseSisb-sib.ch), 

EMBL; X53959; G8615; -. 
PIR; ' A36665; A36665. 
FLYBASE; FBgn0003425; sli. 
PROSITE; PS00010; ASXJYDROXYL; 3. 
PROSITE; PS00022; EGF.1; 7, 
PROSITE; PS01185; CTCK.1; 1, 
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DR 


PROSITE; PS01186; 


EGF_2; 5. 




FT 


DISULFID 


973 982 BY SIMILARITY. 


DR 


PROSITE; PS01187; EGF.CA; 2. 




FT 


DISULFID 


989 1001 BY SIMILARITY, 


DR 


PROSITE; PS01225; CTCKJ; 1, 




FT 


DISULFID 


995 1010 BY SIMILARITY, 


DR 


PFAM; PF00007; Cysjnot; 1. 




FT 


DISULFID 


1012 1021 BY SIMILARITY, 


DR 


PFAM; PFOQ008; EG 


F; 7. 




FT 


DISULFID 


1028 1041 BY SIMILARITY, 


DR 


PFAM; PF00054; laminin_G; 1, 




FT 


DISULFID 


1035 1050 BY SIMILARITY, 


DR 


PFAM; PF00560; LR 


R; 10. 




FT 


DISULFID 


1052 1061 BY SIMILARITY, 


DR 


HSSP; PQ0740; 1IXA, 




FT 


DISULFID 


1068 1079 BY SIMILARITY. 


KW 


NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 


FT 


DISULFID 


1073 1088 BY SIMILARITY, 


KW 


EGF-LIKE DOMAIN; REPEAT; LEUC I NE - REPEAT ; DUPLICATION, 


FT 


DISULFID 


1090 1099 BY SIMILARITY, 


FT 


SIGNAL 


1 


36 




FT 


DISULFID 


1115 1125 BY SIMILARITY. 


FT 




37 


1480 


cr Tip BDOTFTN 
dull rKUlLlH. 


FT 


DISULFID 


1120 1137 BY SIMILARITY. 


FT 


DOMAIN 


70 


104 


CONSERVED N-FLANKING REGION OF THE LRR, 


FT 


DISULFID 


1139 1148 BY SIMILARITY. 


FT 


DOMAIN 


105 


230 


LEUCINE-RICH REPEATS (1ST REGION), 


FT 


DISULFID 


1357 1368 BY SIMILARITY. 


FT 


DOMAIN 


231 


294 


PflNQPRVFn P-PT.iNKTNG RPGTflN P/P THP TUB 


FT 


DISULFID 


1362 1380 BY SIMILARITY, 


FT 


DOMAIN 


295 


326 


PPiN^PRVPn N-PTINKTNP RPPTPiN PlP THP TRP 


FT 


DISULFID 


1382 1391 BY SIMILARITY, 


FT 


DOMAIN 


327 


452 


LEUCINE-RICH REPEATS (2ND REGION), 


FT 


DISULFID 


1409 1443 BY SIMILARITY, 


FT 


DOMAIN 


453 


518 


rONCIPRVPn P-PT JNKTNG RPP.TPiN PiP THP trr 


FT 


DISULFID 


1423 1457 BY SIMILARITY.. 


FT 


DOMAIN 


519 


550 


PON^PWPn N-PT.1NKTNG RTCTflN PlP THP 1,131} 


FT 


DISULFID 


1434 1473 BY SIMILARITY. 


FT 








T CTTPTWP.BTPU Dl?DT7aTO /TBT\ OPfTANh 
LLULlfit KltH KCfLAlo (JKU KfcblUN) , 


FT 


DISULFID 


1438 1475 BY SIMILARITY. 




DOMATN 


654 


714 


rnwepuvpn p-prAMifTWP dpptpim hp tup tdd 


FT 


DISULFID 


1442 1479 BY SIMILARITY. 


B 




715 


746 


PAXTOPTJUPH M-TJT JUT/TXTP DPPTAM AP rTinri T Tin 


SQ 


SEQUENCE 


1480 AA; 165752 MW; 2CD1C421 CRC32; 


w 


DOMAIN 


747 


848 


TFT7PTMP-BTPH BPDFaTC MTH BPPTAM\ 








FT 






Q1 fl 


PAMCPDUPFl P-UT RXTVTXIP DPPTAM AP TtlP T TJTJ 

lUNstiKVtU L rbAMlNb KLblUN Oi IHl LKK. 


Query Match 


37,3%; Score 4210; DB 1; Length 1480; 


FT 


REPEAT 






TRR 1-9 ' 


Best Local Similarity 44.5%; Pred. No. 0.00e+00; 


FT 




116 


139 


TRR 1^' 


Matches 600 


Conservative 328; Mismatches 369; Indels 52; Gaps 33; 


FT 


REPEAT 


140 


163 








FT 








LRR 14 


Db 


105 LELQGNNLTVIYETDFQRLTKLRMLQLTDNQIHTIERNSFQDLVSLERLDISNNVITTVG 164 


FT 


REPEAT 


188 


211 


LRR 1*5, 




1:1 


::: 1 II 1 1 hi 1::: : 1 1 MM: | :: 


FT 


DFDPM 1 








oy 


84 LQLMENKISTIERGAFQDLKELERLRLNRNHLQLFPELLFLGTAKLYRLDLSENQIQAIP 143 


FT 


REPEAT 


327 


337 


trr vi ' 

TBB ' ' 








FT 


BPDPST 


338 


361 


LRR 1-1. 


Db 


165 RRVTKGAQSLRSLQLDNNQITCLDEHAFKGLVELEILTLNNNNLTSLPHNIFGGLGRLRA 224 


FT 


DPDPRT< 


io? 




LRR 2-3. 




1: Ml ::;|||| Mhl::: ||::| : 1 1 : 1 1 1 1 1 1 1 : 1 |: 1 : :||; 


FT 


KhPLAI 


386 


409 


LRR 2-4. 


Qy 


144 RKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTINNNNITRLSVASFNHMPKLRT 203 


FT 


BPPPBT 


410 


433 


LRR 2-5. 






FT 


BPDPRT 

KtrJjftl 






LRR 2-6, 


Db 


225 LRLSDNPFACDCHLSWLSRFLRSATRLAPYTRCQSPSQLKGQNVADLHDQEFKCSGLTE- 283 


FT 




551 


562 






:|| 


: llllhlll II 1:: Ml : 1 1 ; | : | : 1 1 1 : : ; :|| II | 


FT 


REPEAT 


563 


586 


TRR 3-9 
tbb ' ' 


Qy 


2 04 FRLHSNNLYCDCHLAWLSDKLRKRPRVGLYTQCMGPSHLRGHPAEVQKREFVCSDEEEG 2 6 3 


FT 




587 












FT 


REPEAT 




634 


TRR J 
rnn I t 


Db 


284 HAP-MECG - AENSCPHPCRCADGIVDCREKSLTSVPVTLPDDTTDVRLEQNFITELPPKS 3 4 1 


FT 


REPEAT 


635 


653 


LRR 3*5, 




1 : 1 


: : II :| M Mill Ml :| II: MIM I :|| : 


FT 


KLrtAI 




757 


LRR 4'1, 


Qy 


264 HQSFMAPSCSVLHCPAACTCSNNIVDCRGKGLTEIPTNLPETITEIRLEQNTIKVIPPGA 323 


FT 


KLrJj/U 






^ 






FT 




782 


805 


LRR 4-3, 


Db 


342 FSSFRRLRRIDLSNNNISRIAHDALSGLKQLTTLVLYGNKIKDLPSGVFKGLGSLRLLLL 401 


FT 


dpdp&t 


806 


829 


^nn \ l' 




II::: 


MUM II :| II: II: 1 MMM :|| M II ||:|||l 


FT 


dpdp&ip 
KfcrtAl 




848 


LRR 4 '5. 


oy 


324 FSPYKKLRRIDLSNNQISELAPDAFQGLRSLNSLVLYGNKITELPKSLFEGLFSLQLLLL 3 8 3 


FT 


lYMullTM 
UUMA1N 


90/ 




EVP.? TVP 1 ' 








FT 


DOMAIN 


946 


983 


EGF-LIKE 2. 


Db 


402 NANEISCIRKDAFRDLHSLSLLSLYDNNIQSLANGTFDAMKSMKTVHLAKNPFICDCNLR 461 




DOMAIN 


985 




EGF-LIKE 3, CALCIUM- BINDING (POTENTIAL). 




III 1 


hi llhllhhlllllll::|::hlll ::::: hill IIIIIM; 


m 


DOMAIN 


1024 


1062 


EGF-LIKE 4, 


Qy 


384 NANKINCLRVDAFQDLHNLNLLSLYDNKLQTIAKGTFSPLRAIQTMHLAQNPFICDCHLK 443 


W 


DOMAIN 


1064 


1100 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 








w 


DOMAIN 


1111 


1149 


EGF-LIKE 6. 


Db. 


462 WLADYLHKNPIETSGARCESPKRMHRRRIESLREEKFKCS-WGELRMKLSGECRMDSDCP 520 


FT 


DOMAIN 


1353 


1392 


EGF-LIKE 7, 




Mill 


1 llllllllll hi: :H :: Ihll : 1 lllhl 1 II 


FT 


DOMAIN 


1409 


1480 


CTCK. 


oy 


444 WLADYLHTNPIETSGARCTSPRRLANKRIGQIKSKKFRCSGTEDYRSKLSGDCFADLACP 503 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM). 








FT 


CARBOHYD 


111 


111 


POTENTIAL. 


Db 


521 AMCHCEGTTVDCTGRRLKEIPRDIPLHTTELLLNDNELGRISSDGLFGRLPHLVKLELKR 580 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 




:hl 


MIM ::|: 1 1 1:11 Ml: : : hi :||:| |::: 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


Qy 


504 EKCRCEGTTVDCSNQKLNKIPEHIPQYTAELRLNNNEFTVLEATGIFKKLPQLRKINFSN 563 


FT 


CARBOHYD 


435 


435 


POTENTIAL. 






FT 


CARBOHYD 


783 


783 


POTENTIAL. 


Db 


581 NQLTGIEPNAFEGASHIQE-L— QL-G-ENKI-K-E I-SNKM FL 616 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 




1 :| 


1 Mill 1 1 :l ::h 1 1 : Ih: h 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


Qy 


564 NKITDIEEGAFEGASGVNEILLTSNRLENVQHKMFKGLESLKTLMLRSNRITCVGNDSFI 623 


FT 


CARBOHYD 


998 


998 


POTENTIAL, 








FT 


CARBOHYD 


1060 


1060 


POTENTIAL, 


Db 


617 GLHQLKTLNLYDNQISCVMPGSFEHLNSLTSLNLASNPFNCNCHLAWFAECVRKKSLNGG 67 6 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 




II : 


hlllllh 1 Ihh 1:1 MINI! Mi:: ;||| ; I 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL, 


0y 


624 GLSSVRLLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLAWLGEWLRKKRIVTG 683 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL, 








FT 


CARBOHYD 


1292 


1292 


POTENTIAL, 


Db 


677 AARCGAPSKVRDVQIKDLPHSEFKCSSENSE-GCLGDGYCPPSCTCTGTWACSRNQLKE 735 


FT 


DISULFID 


911 


922 


BY SIMILARITY, 




:|l 


1 :::: 1 1:: :| 1 1 : :| : II III III II : II 


FT 


DISULFID 


916 


932 


BY SIMILARITY. 


oy 


684 NPRCQKPYFLKEIPIQDVAIODFTCDDGNDDNSCSPLSRCPTECTCLDTWRCSNKGLKV 743 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 








FT 


DISULFID 


950 


961 


BY SIMILARITY. 


Db 


736 IPRGIPAETSELYLESNEIEQIHYERIRHLRSLTRLDLSNNQITILSNYTFANLTKLSTL 795 


FT 


DISULFID 


955 


971 


BY SIMILARITY. 




:|:IH : :llll::|:: : 1 : : : II :lllll:|: III :|:|:l 1 II ' 
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Qy 744 LPKGIPRDVTELYLDGNQFTLVPKE-LSNYKHLTLIDLSNNRISTLSNQSFSNMTQLLTL 802 

Db 796 IISYNKLQCLQRHALSGLNNLRWSLHGNRISMLPEGSFEDLKSLTHIALGSNPLYCDCG 855 

|:|||:|:|: ::: 1 1 : : 1 1 : : 1 1 1 1 1 ||::|||:|:|| : 1 : 1 : 1 : 1 : 1 1 1 1 1 1 1 
Qy 803 ILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISWPEGAFNDLSALSHLAIGANPLYCDCN 862 

Db 856 LKWFSDWIKLDYVEPGIARCAEPEQMKDKLILSTPSSSFVCRGRVRNDILAKCNACFEQP 915 

: 1-111:1 H llllllll I :| I hi I :||||||:|; | 

Qy 863 MQWLSDWVKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDVNILAKCNPCLSNP 922 

Db 916 CQNQAQCVALPQREYQCLCQPGYHGKHCEFMIDACYGNPCRNNATCTVLE--EGRFSCQC 973 

I I:: I : I hi I hi \- III :|||:: :|| : I I III 
Qy 923 CKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWCIC 982 

Db 974 APGYTGARCETNIDDCLGEIKCQNNATCIDGVESYKCECQPGFSGEFCDTKIQFCSPEFN 1033 

I I: I II hill : 1 : 1 1 : 1 1 : 1 1 : : : I I I I ::||:|: |::||: ::| 

#983 ADGFEGENCEVNVDDC-EDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQDLN 1041 
1034 PCANGAKCMDHFTHYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDD 1093 

II : :||: : III :|: I :| ::||||:::| II: I |::| I I ||: 

Qy 1042 PCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICPEG 1101 

Db 1094 YTGKYCEGHNMISMMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTS 1153 

1:1 :ll : :h hlllhl :| :| I : :: :|:| III Mill 
Qy 1102 YSGLFCE-FSP-PMVLPRTSPCDNFDCQNGA--QCIVRINEPICQCLPGYQGEKCEKLVS 1157 

Db 1154 ISFVHNNSFVELEPLRTRPEANVTIVFSSAEQNGILMYDGQDAHLAVELFNGRIRVSYDV 1213 

::|::::|:::= : : lh:|:|: ::: |::|||:| |: |:||||: ||:| ||| 
Qy 1158 VNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRVRASYDT 1217 

Db 1214 GNHPVSTMYSFEMVADGKYHAVELLAIKKNFTLRVDRGLARSIINEGSNDYLKLTTPMFL 1273 
hll l::ll I : ll::| Mill: :::| II I :: I | : |:; :|::: 
1218 GSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSTLNFDSPLYV 1277 



Qy 



Db 



1274 GGLPVDPAQQAYKNWQIRNLTSFKGCMKEVWINHKLVDFGNAQRQQKITPGCAL-LEGE- 1331 
11:1 : : : :l III II:::: II I II : II III 
Qy 1278 GGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPGCEPCHKKVC 1337 

Db 1332 QQEE - EDDEQD - FM - D - - ET - - PH I KEEPV - DPCLENKCRRGSRCVPNSNARDG YQCKCK 1383 

: : I I : I : :: Nil III :|: |:| II :| III 
Qy 1338 AHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGT-CLPI-NAF-SYSCKCL 1394 



Db 



I 

TESt 

ID 
AC 
DT 
DT 
DT 



1384 HGQRGRYCDQGEGSTEP-PTVT-AASTCR 1410 

I: I II: I :| :: : II 
1395 EGHGGVLCDEEEDLFNPCQAIKCKHGKCR 1423 



FBPl.STRPU STANDARD; PRT; 1064 AA. 
P10079; 

01-MAR-1989 (REL. 10, CREATED) 

01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

FIBROPELLIN I PRECURSOR (EPIDERMAL GROWTH FACTOR -RELATED PROTEIN 1) 

(UEGF-1). 

EGF1. 

STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN) , 
EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 
EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONG YLOCENTROT IDAE ; 
STRONGYLOCENTROTUS. 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 90112459. 

DELGADI LLO - REYNOSO M.G., ROLLO D.R., HURSH D.A., RAFF R.A.; 
"Structural analysis of the uEGF gene in the sea urchin 
strongylocentrotus purpuratus reveals more similarity to vertebrate 
than to invertebrate genes with EGF-like repeats,"; 
J. MOL. EVOL. 29:314-327(1989). 
[21 

SEQUENCE OF 279-476 AND 781-1064 FROM N.A, 
MEDLINE; 87319677. 

HURSH D.A., ANDREWS M.E., RAFF R.A.; 

"A sea urchin gene encodes a polypeptide homologous to epidermal 



RT growth factor."; 

RL SCIENCE 237:1487-1490(1987). 

RN [3] 

RP AVIDIN-LIKE DOMAIN. 

RX MEDLINE; 89196806. 

RA HUNT L.T., BARKER W.C.; 

RT "Avidin-like domain in an epidermal growth factor homolog from a sea 

RT urchin."; 

RL FASEB J. 3:1760-1764(1989). ■ 

RN [4] 

RP CHARACTERIZATION. 

RX MEDLINE; 91285254. 

RA BISGROVE B.W., ANDREWS M.E., RAFF R.A.; 

RT "Fibropellins, products of an EGF repeat-containing gene, form a 

RT unique extracellular matrix structure that surrounds the sea urchin 

RT embryo."; 

RL DEV. BIOL. 146:89-99(1991). 

CC -!- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
CC MATRIX. 

CC -!- SUBCELLULAR LOCATION: EXTRACELLULAR. IN VESICLES IN THE CYTOPLASM 
CC OF UNFERTILIZED EGGS, THEN TO THE BASE OF THE HYALIN LAYER 
CC THROUGHOUT DEVELOPMENT AND FINALLY IN THE APICAL LAMINA IN LATE 
' CC EMBRYOS AND EARLY LARVAE . 

CC -!- DEVELOPMENTAL STAGE: MODERATE LEVELS IN UNFERTILIZED EGGS AND 
CC DURING EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN 

CC LATE MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS 
CC MAINTAINED THROUGH SUBSEQUENT STAGES. EXPRESSED BOTH MATERNALLY 
CC AND ZYGOTICALLY. 

CC -!- ALTERNATIVE PRODUCTS: TWO FORMS (IA AND IB) ARE PRODUCED BY 
CC ALTERNATIVE SPLICING. THE SMALL FORM (IB) LACKS 8 EGF REPEATS. 

CC -!- SIMILARITY: CONTAINS 21 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 1 CUB DOMAIN. 

CC -!- SIMILARITY: THE C- TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
CC TO AVIDIN/STREPTAVIDIN. 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to UcenseSisb-sib.ch), 

cc 

DR EMBL; L08692; G161467; -. 

DR EMBL; L08692; G161466; -. 

DR EMBL; X17530; 6667061; -. 

DR EMBL; M17421; G552260; -. 

DR EMBL; X17533; G667062; -. 

DR PIR; A29316; A29316. 

DR PROSITE; PS00010; ASX HYDROXYL; 19. 

DR PROSITE; PS00022; EGF_1; 19. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS0U80; CUB; 1. 

DR PROSITE; PS01186; EGFJ; 19. 

DR PROSITE; PS01187; EGF.CA; 19. 

DR PFAM; PF00008; EGF; 21. 

DR PFAM; PF00431; CUB; 1. 

DR HSSP; P01132; 1EPH. 

KW ' BIOTIN; ALTERNATIVE SPLICING; EGF-LIKE DOMAIN; REPEAT; SIGNAL; 

KW GLYCOPROTEIN. . 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


CHAIN 


20 


1064 


FIBROPELLIN I, 


FT 


DOMAIN 


20 


55 


EGF-LIKE 1. 


FT 


DOMAIN 


62 


175 


CUB. 


FT 


DOMAIN 


176 


212 


EGF-LIKE 2, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


442 


478 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


480 


516 


EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL) 
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FT 


DOMAIN 


518 


554 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 907 922 BY SIMILARITY. 


FT 


DOMAIN 


556 


592 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 924 933 BY SIMILARITY. 


FT 


DOMAIN 


594 


630 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


VARSPLIC 477 780 MISSING (IN FORM IB) . 


FT 


DOMAIN 


632 


668 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


CARBOHYD 30 30 POTENTIAL. 


FT 


DOMAIN 


670 


706 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


CARBOHYD 136 136 POTENTIAL. 


FT 


DOMAIN 


708 


744 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


CARBOHYD 851 851 POTENTIAL. 


FT 


DOMAIN 


746 


782 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


CONFLICT 279 279 L -> S (IN REF, 2) . 


FT 


DOMAIN 


784 


820 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) . 


SO. 


SEQUENCE 1064 AA; 112072 MW; FBD10D48 CRC32; 


FT 


DOMAIN 


822 


858 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) . 






FT 


DOMAIN 


860 


896 


EGF-LIKE 20. 


Query Match 6,5*; Score 735; DB 1; Length 1064; 


FT 


DOMAIN 


898 


934 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


Best Local Similarity 44.5%; Pred. No. l,25e-127; 


FT 


DOMAIN 


936 


1064 


AVIDIN-LIKE. 


Matches 106; Conservative 38; Mismatches 81; Indels 13; Gaps 9; 


FT 


DISULFID 


23 


34 


BY SIMILARITY. 






FT 


DISULFID 


28 


43 


BY SIMILARITY. 


Db 


254 NECASSPCLNGGIC - VDGVNMFECTCLAGFTGVRCEVNIDECASAPCQNGGIC - I - DGI - 309 


FT 


DISULFID 


45 


54 


BY SIMILARITY. 




1 1 hll III 1 1: : III II 1 hi 1 1 1 II :|| 1 : :| 


FT 


DISULFID 


180 


191 


BY SIMILARITY. 


Qy 


916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 


FT 


DISULFID 


185 


200 


BY SIMILARITY. 






FT 


DISULFID 


202 


211 


BY SIMILARITY. 


Db 


310 NGYTCSCPLGFSGDNCENNDDECSSIPCLNGGTCVDLVNAYMCVCAPGWTGPTCADNIDE 369 


FT 


DISULFID 


218 


229 


BY SIMILARITY. 




:|: 1 1: II hill 1 hi II :|lll :| 1 hhl II 1 


FT 


DISULFID 


223 


238 


BY SIMILARITY. 


Qy 


976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 




DISULFID 


240 


249 


BY SIMILARITY. 




1 


DISULFID 


256 


267 


BY SIMILARITY, 


Db 


370 CA-S-APCQNGGVCIDGVNGYMCDCQPGYTGTHCETDIDECARPPCQNGGDCVDGVNGYV 427 


w 


DISULFID 


261 


276 


BY SIMILARITY. 




II < III: : II :h:||| III l lh hhl 1 lh 1 hllll 


FT 


DISULFID 


278 


287 


BY SIMILARITY, 


Qy 


1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 


FT 


DISULFID 


294 


305 


BY SIMILARITY, 






FT 


DISULFID 


299 


314 


BY SIMILARITY. 


Db 


428 CICAPGFDGLNCE-NN- -I DECASRPCQNGAVCVDGVNGFVCTCSAGYTGVLCE 478 


FT 


DISULFID 


316 


325 


BY SIMILARITY, 




III: 1: II II : : 1 : Mill h :l :l 1 :|| 1 II 


FT 


DISULFID 


332 


343 


BY SIMILARITY. 


Qy 


1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 


FT 


DISULFID 


337 


352 


BY SIMILARITY. 






FT 


DISULFID 


354 


363 


BY SIMILARITY. 






FT 


DISULFID 


370 


381 


BY SIMILARITY. 


RESULT 3 


FT 


DISULFID 


375 


390 


BY SIMILARITY. 


ID 


FBP3JTRPU STANDARD; PRT; 570 AA. 


FT 


DISULFID 


392 


401 


BY SIMILARITY. 


AC 


P49013; 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 


DT 


01-FEB-1996 (REL. 33, CREATED) 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


DT 


01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 


FT 


DISULFID 


430 


439 


BY SIMILARITY, 


DT 


01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 


FT 


DISULFID 


446 


457 


BY SIMILARITY, 


DE 


FIBROPELLIN C PRECURSOR (EPIDERMAL GROWTH FACTOR- RELATED PROTEIN 3) 


FT 


DISULFID 


451 


466 


BY SIMILARITY. 


DE 


(EGF III) (FIBROPELLIN III). 


FT 


DISULFID 


468 


477 


BY SIMILARITY. 


GN 


EGF3, 


FT 


DISULFID 


484 


495 


BY SIMILARITY. 


OS 


STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN), 


FT 


DISULFID 


489 


504 


BY SIMILARITY, 


OC 


EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 


FT 


DISULFID 


506 


515 


BY SIMILARITY. 


OC 


EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROTIDAE; 


FT 


DISULFID 


522 


533 


BY SIMILARITY. 


OC 


STRONGYLOCENTROTUS. 


FT 


DISULFID 


527 


542 


BY SIMILARITY, 


RH 


[1] 


FT 


DISULFID 


544 


553 


BY SIMILARITY, 


RP 


SEQUENCE FROM N.A. 


FT 


DISULFID 


560 


571 


BY SIMILARITY. 


RC 


TISSUE=GASTRULA; 


FT 


DISULFID 


565 


580 


BY SIMILARITY. 


RX 


MEDLINE; 93273088. 


FT 


DISULFID 


582 


591 


BY SIMILARITY. 


RA 


BISGROVE B.W., RAFF R.A.; 


cr 


DISULFID 


598 


609 


BY SIMILARITY. 


RT 


"The SpEGF III gene encodes a member of the fibropellins: EGF repeat- 


m 


DISULFID 


603 


618 


BY SIMILARITY. 


RT 


containing proteins that form the apical lamina of the sea urchin 
embryo."; 


V 


DISULFID 


620 


629 


BY SIMILARITY, 


RT 


FT 


DISULFID 


636 


647 


BY SIMILARITY, 


RL 


DEV. BIOL. 157:526-538(1993). 


FT 


DISULFID 


641 


656 


BY SIMILARITY, 


CC 


■!- FUNCTION; FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 


FT 


DISULFID 


658 


667 


BY SIMILARITY, 


CC 


MATRIX. 


FT 


DISULFID 


674 


685 


BY SIMILARITY. 


CC 


-1- SUBCELLULAR LOCATION: EXTRACELLULAR. 


FT 


DISULFID 


679 


694 


BY SIMILARITY. 


CC 


■!- DEVELOPMENTAL STAGE: LOW LEVELS IN UNFERTILIZED EGGS AND DURING 


FT 


DISULFID 


696 


705 


BY SIMILARITY. 


CC 


EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN LATE 


FT 


DISULFID 


712 


723 


BY SIMILARITY. 


CC 


MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS MAINTAINED 


FT 


DISULFID 


717 


732 


BY SIMILARITY, 


CC 


THROUGH SUBSEQUENT STAGES. 


FT 


DISULFID 


734 


743 


BY SIMILARITY, 


CC 


-!- EXPRESSED BOTH MATERNALLY AND ZYGOTICALLY. 


FT 


DISULFID 


750 


761 - 


BY SIMILARITY, 


CC 


-!- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 


FT 


DISULFID 


755 


770 


BY SIMILARITY, 


CC 


•!- SIMILARITY: CONTAINS 1 CUB DOMAIN. 


FT 


DISULFID 


772 


781 


BY SIMILARITY, 


CC 


-!- SIMILARITY: THE C'TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 


FT 


DISULFID 


788 


799 


BY SIMILARITY, 


CC 


TO AVIDIN/STREPTAVIDIN. 


FT 


DISULFID 


793 


808 


BY SIMILARITY, 


CC 




FT 


DISULFID 


810 


819 


BY SIMILARITY, 


CC 


This SWISS-PROT entry is copyright, It is produced through a collaboration 


FT 


DISULFID 


826 


837 


BY SIMILARITY, 


• CC 


between the Swiss Institute of Bioinformatics and the EMBL outstation - 


FT 


DISULFID 


831 


846 


BY SIMILARITY, 


CC 


the European Bioinformatics Institute. There are no restrictions on its 


FT 


DISULFID 


848 


857 


BY SIMILARITY. 


CC 


use by non-profit institutions as long as its content is in no way 


FT 


DISULFID 


864 


875 


BY SIMILARITY. 


CC 


modified and this statement is not removed. Usage by and for commercial 


FT 


DISULFID 


869 


884 


BY SIMILARITY.. 


CC 


entities requires a license agreement (See http : //www , isb- s ib . ch/announce/ 


FT 


DISULFID 


886 


895 


BY SIMILARITY. 


CC 


or send an email to licenseGisb-sib.ch). 


FT 


DISULFID 


902 


913 


BY SIMILARITY. 


CC 
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DR EMBL; L07045; G310660; -. 

DR PROSIIE; PS00010; ASXJYDROXYL; 8. 

DR PROSITE; PS00022; EGFJ; 8. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01180; COB; 1. 

DR PROSITE; PS01186; EGFJ; 7. 

DR PROSITE; PS01187; EGF.CA; 6. 

DR PFAM; PF00008; EGF; 8. 

DR PFAM; PF00431; COB; 1. 

DR HSSP; P00740; 1IXA. 

KW BIOTIN; EGF- LIKE DOMAIN; REPEAT; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


570 


FIBROPELLIN C. 


FT 


DOMAIN 


18 


55 


EGF-LIKE 1. 


FT 


DOMAIN 


62 


175 


CU 


3. 


FT 


DOMAIN 


176 


212 


EG 


'•LIKE 2, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


214 


250 


EG 


'-LIKE 3, CALCIUM-BINDING (POTENTIAL) 


m 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL) 


1 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL) 


TT 


DOMAIN 


328 


364 


EG 


'-LIKE 6, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7. 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


442 


570 


AVIDIN-LIKE. 


FT 


DISOLFID 


23 


34 


BY 


SIMILARITY. 


FT 


DISULFID 


28 


43 


BY 


SIMILARITY. 


FT 


DISOLFID 


45 


54 


BY 


SIMILARITY. 


FT 


DISOLFID 


180 


191 


BY 


SIMILARITY. 


FT 


DISOLFID 


185 


200 


BY 


SIMILARITY. 


FT 


DISOLFID 


202 


211 


BY 


SIMILARITY. 


FT 


DISOLFID 


218 


229 


BY 


SIMILARITY. 


FT 


DISOLFID 


223 


238 


BY 


SIMILARITY. 


FT 


DISOLFID 


240 


249 


BY 


SIMILARITY. 


FT 


DISOLFID 


256 


267 


BY 


SIMILARITY. 


FT 


DISOLFID 


261 


276 


BY 


SIMILARITY. 


FT 


DISOLFID 


278 


287 


BY 


SIMILARITY. 


FT 


DISOLFID 


294 


305 


BY 


SIMILARITY. 


FT 


DISOLFID 


299 


314 


BY 


SIMILARITY. 


FT 


DISOLFID 


316 


325 


BY 


SIMILARITY. 


FT 


DISOLFID 


332 


343 


BY 


SIMILARITY. 


FT 


DISOLFID 


337 


352 


BY 


SIMILARITY. 


FT 


DISOLFID 


354 


363 


BY 


SIMILARITY. 


FT 


DISOLFID 


370 


381 


BY 


SIMILARITY. 


FT 


DISOLFID 


375 


390 


BY 


SIMILARITY. 


FT 


DISOLFID 


392 


401 


BY 


SIMILARITY. 


FT 


DISOLFID 


408 


419 


BY 


SIMILARITY. 




DISOLFID 


413 


428 


BY 


SIMILARITY. 


1 


DISOLFID 


430 


439 


BY 


SIMILARITY. 




CARBOHYD 


30 


30 


POTENTIAL. 


FT 


CARBOHYD 


136 


136 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 



SO SEQUENCE 570 AA; 61116 MW; 265BC4BB CRC32; 

Query Match 6.4%; Score 726; DB 1; Length 570; 

Best Local Similarity 40.3%; Pred. No. 1.32e-125; 

Matches 96; Conservative 54; Mismatches 75; Indels 13; Gaps £ 

Db 178 DDCTPNPCLNGATC-VDQVNDYQCICAPGFTGDNCETDIDECASAPCRNGGAC-V-D-QV 233 

: I :lll I :M I I: hi I: II h:|: I I I l|::||:| : : : 
Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 



Db 



Db 



234 NGYTCNCIPGFNGVNCENNINECASIPCLNGGICVDGINQFACTCLPGYTGILCETDINE 293 

:|: I I I: III |:::| I I : Mill ::| I I III III :: 
976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 



294 CA-S-SPCQNGGSCTDAVNRYTCDCRAGFTGSNCETNINECASSPCLNGGSCLDGVDGYV 351 
II :|||: : I : : : III :|: I :|: ::::| : I ||: I |:|:|| 
Qy 1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 

Db 352 CQCLPNYTGTHCEIS-— L--DACASLPCQNGGVCTNVGGDYVCECLPGYTGINCE 402 

I I I: 1:1 | :| :: ||||: | : :|:||||| I :|| 
Qy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 



RESULT 4 

ID NTC3JOOSE STANDARD; PRT; 2318 AA. 

AC Q61982; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL, 35, LAST SEQOENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH 3 PROTEIN. 

GN NOTCH3. 

OS MOS MUSCULUS (MOOSE) . 

OC EOKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIOROGNATHI; MORIDAE; MORINAE; MOS. ' 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-ICR X SWISS WEBSTER; 

RX MEDLINE; 95001556. 

RA LARDELLI M., DALSTRAND J., LENDAHL U.; 

RT "The novel Notch homologue mouse Notch 3 lacks specific epidermal 

RT growth factor -repeats and is expressed in proliferating 

RT neuroepithelium. " ; 

RL MECH. DEV. 46:123-136(1994), 

CC -!- FONCTION: NOTCH 1, 2 AND 3 PLAY A COMBINATIONAL ROLE DORING 
CC VARIOUS CELL FATE DECISIONS AND MORPHOLOGICAL MOVEMENTS IN THE 
CC DEVELOPING CNS AND PROBABLY OTHER REGIONS OF THE EMBRYO. 

CC -I- TISSOE SPECIFICITY: PROLIFERATING NEUROEPITHELIUM. 

CC ■!■ DEVELOPMENTAL STAGE: CNS DEVELOPMENT. 

CC -!- SIMILARITY: CONTAINS 34 EGF-LIKE DOMAINS. 

CC -I- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY : CONTAINS 6 CDC10/SWI6 REPEATS, 

cc 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Osage by and for commercial 

CC entities requires a license agreement (See http://ww.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib,ch). 

CC 

DR EMBL; X74760; G483581; -. 

DR MGD; MGI: 99460; NOTCH3. 

DR PROSITE; PS00010; ASXJYDROXYL; 18. 

DR PROSITE; PS00022; EGF 1; 33. 

DR PROSITE; PS01186; EGFJ; 27. 

DR PROSITE; PS01187; EGFJA; 17. 

DR PFAM; PF00008; EGF; 33. 

DR PFAM; PF00023; ank; 6. 

DR' PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN. 

FT DOMAIN 1 1643 EXTRACELLULAR. 

FT TRANSMEM 1644 1664 POTENTIAL. 

FT DOMAIN 1665 2318 CYTOPLASMIC. 

FT DOMAIN 39 1374 34 X EGF-TYPE REPEATS. 

FT DOMAIN 1388 1503 3 X LIN/NOTCH REPEATS. 

FT DOMAIN . 1784 1998 6 X CDC10/SWI6 REPEATS , 

FT DOMAIN 2242 2261 PEST, 

FT DOMAIN 39 78 EGF-LIKE 1, 

FT DOMAIN 79 119 EGF-LIKE 2, 

FT DOMAIN 120 157 EGF-LIKE 3. 

FT DOMAIN 159 196 EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL) , 

FT DOMAIN 198 235 EGF-LIKE 5, 

FT DOMAIN 237 273 EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL) . 

FT DOMAIN 275 313 EGF-LIKE 7. 

FT DOMAIN 315 351 EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL) , 

FT DOMAIN 352 390 EGF-LIKE 9. 

FT DOMAIN 392 430 EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL) , 

FT DOMAIN 432 468 EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL) . 

FT DOMAIN 470 506 EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) . 

FT DOMAIN 508 544 EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) . 

FT DOMAIN 546 581 EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 

FT DOMAIN 583 619 EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL) , 

FT DOMAIN 621 656 EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 

FT DOMAIN 658 694 EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 
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FT 


DOMAIN 


696 


731 


EGF-LIRE 18. 


FT 


DISULFID 


646 


655 


BY SIMILARITY. 


FT 


DOMAIN 


735 


771 


EGF-LIKE 19. 


FT 


DISULFID 


662 


673 


BY SIMILARITY. 


FT 


DOMAIN 


772 


809 


EGF-LIKE 20. 


FT 


DISULFID 


667 


682 


BY SIMILARITY, 


FT 


DOMAIN 


811 


848 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


684 


693 


BY SIMILARITY. 


FT 


DOMAIN 


850 


886 


EGF-LIKE 22, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


700 


710 


BY SIMILARITY. 


FT 


DOMAIN 


888 


923 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


705 


719 


BY SIMILARITY. 


FT 


DOMAIN 


925 


961 


EGF-LIKE 24. 


FT 


DISULFID 


721 


730 


BY SIMILARITY. 


FT 


DOMAIN 


963 


999 


EGF-LIKE 25. 


FT 


DISULFID 


739 


750 


BY SIMILARITY. 


FT 


DOMAIN 


1001 


1035 


EGF-LIKE 26. 


FT 


DISULFID 


744 


759 


BY SIMILARITY. 


FT 


DOMAIN 


1037 


1083 


EGF-LIKE 27. 


FT 


DISULFID 


761 


770 


BY SIMILARITY. 


FT 


DOMAIN 


1085 


1121 


EGF-LIKE 28. 


FT 


DISULFID 


776 


787 


BY SIMILARITY, 


FT 


DOMAIN 


1123 


1159 


EGF-LIKE 29, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


781 


797 


BY SIMILARITY, 


FT 


DOMAIN 


1161 


1204 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


799 


808 


BY SIMILARITY. 


FT 


DOMAIN 


1206 


1245 


EGF-LIKE 31. 


FT 


DISULFID 


815 


827 


BY SIMILARITY. 


FT 


DOMAIN 


1247 


1288 


EGF-LIKE 32. 


FT 


DISULFID 


821 


836 


BY SIMILARITY. 


FT 


DOMAIN 


1290 


1326 


EGF-LIKE 33, 


FT 


DISULFID 


838 


847 


BY SIMILARITY. 


FT 


DOMAIN 


1336 


1374 


EGF-LIKE 34. 


FT 


DISULFID 


854 


865 


BY SIMILARITY. 


FT 


REPEAT 


1388 


1428 


LIN/NOTCH 1. 


FT 


DISULFID 


859 


874 


BY SIMILARITY. 


FT 


REPEAT 


1429 


1467 


LIN/NOTCH 2. 


FT 


DISULFID 


876 


885 


BY SIMILARITY, 


FT 


REPEAT 


1468 


1503 


LIN/NOTCH 3. 


FT 


DISULFID 


892 


902 


BY SIMILARITY. 




REPEAT 


1784 


1816 


CDC10/SWI6 1. 


FT 


DISULFID 


897 


911 


BY SIMILARITY, 


■ 


REPEAT 


1817 


1865 


CDC10/SWI6 2. 


FT 


DISULFID 


913 


922 


BY SIMILARITY, 


w 


REPEAT 


1866 


1898 


CDC10/SWI6 3. 


FT 


DISULFID 


929 


940 


BY SIMILARITY. 


FT 


REPEAT 


1899 


1932 


CDC10/SWI6 4, 


FT 


DISULFID 


934 


949 


BY SIMILARITY. 


FT 


REPEAT 


1933 


1965 


CDC10/SWI6 5. 


FT 


DISULFID 


951 


960 


BY SIMILARITY. 


FT 


REPEAT 


1966 


1998 


CDC10/SWI6 6. 


FT 


DISULFID 


967 


978 


BY SIMILARITY. 


FT 


DISULFID 


43 


55 


BY SIMILARITY, 


FT 


DISULFID 


972 


987 


BY SIMILARITY. 


FT 


DISULFID 


49 


66 


BY SIMILARITY. 


FT 


DISULFID 


989 


998 


BY SIMILARITY. 


FT 


DISULFID 


68 


77 


BY SIMILARITY. 


FT 


DISULFID 


1005 


1016 


BY SIMILARITY. 


FT 


DISULFID 


83 


94 


BY SIMILARITY, 


FT 


DISULFID 


1010 


1023 


BY SIMILARITY. 


FT 


DISULFID 


88 


107 


BY SIMILARITY. 


FT 


DISULFID 


1025 


1034 


BY SIMILARITY. 


FT 


DISULFID 


109 


118 


BY SIMILARITY. 


FT 


DISULFID 


1041 


1062 


BY SIMILARITY. 


FT 


DISULFID 


124 


135 


BY SIMILARITY. 


FT 


DISULFID 


1056 


1071 


BY SIMILARITY. 


FT 


DISULFID 


129 


145 


BY SIMILARITY. 


FT 


DISULFID 


1073 


1082 


BY SIMILARITY. 


FT 


DISULFID 


147 


156 


BY SIMILARITY. 


FT 


DISULFID 


1089 


1100 


BY SIMILARITY. 


FT 


DISULFID 


163 


175 


BY SIMILARITY. 


FT 


DISULFID 


1094 


1109 


BY SIMILARITY. 


FT 


DISULFID 


169 


184 


BY SIMILARITY. 


FT 


DISULFID 


1111 


1120 


BY SIMILARITY. 


FT 


DISULFID 


186 


195 


BY SIMILARITY. 


FT 


DISULFID 


1127 


1138 


BY SIMILARITY. 


FT 


DISULFID 


202 


213 


BY SIMILARITY. 


FT 


DISULFID 


1132 


1147 


BY SIMILARITY. 


FT 


DISULFID 


207 


223 


BY SIMILARITY. 


FT 


DISULFID 


1149 


1158 


BY SIMILARITY. 


FT 


DISULFID 


225 


234 


BY SIMILARITY. 


FT 


DISULFID 


1165 


1183 


BY SIMILARITY. 


FT 


DISULFID 


241 


252 


BY SIMILARITY. 


FT 


DISULFID 


1177 


1192 


BY SIMILARITY. 


FT 


DISULFID 


246 


261 


BY SIMILARITY. 


FT 


DISULFID 


1194 


1203 


BY SIMILARITY, 


FT 


DISULFID 


263 


272 


BY SIMILARITY, 


FT 


DISULFID 


1210 


1223 


BY SIMILARITY. 


FT 


DISULFID 


279 


292 


BY SIMILARITY. 


FT 


DISULFID 


1215 


1233 


BY SIMILARITY. 


FT 


DISULFID 


286 


301 


BY SIMILARITY. 


FT 


DISULFID 


1235 


1244 


BY SIMILARITY. 


FT 


DISULFID 


303 


312 


BY SIMILARITY, 


FT 


DISULFID 


1251 


1262 


BY SIMILARITY. 


FT 


DISULFID 


319 


330 


BY SIMILARITY. 


FT 


DISULFID 


1256 


1276 


BY SIMILARITY. 


FT 


DISULFID 


324 


339 


BY SIMILARITY. 


FT 


DISULFID 


1278 


1287 


BY SIMILARITY. 


A 


DISULFID 


341 


350 


BY SIMILARITY. 


FT 


DISULFID 


1294 


1305 


BY SIMILARITY. 


w 


DISULFID 


356 


367 


BY SIMILARITY. 


FT 


DISULFID 


1299 


1314 


BY SIMILARITY. 


FT 


DISULFID 


361 


378 


BY SIMILARITY. 


FT 


DISULFID 


1316 


1325 


BY SIMILARITY. 


FT 


DISULFID 


380 


389 


BY SIMILARITY. 


FT 


DISULFID 


1340 


1351 


BY SIMILARITY. 


FT 


DISULFID 


396 


409 


BY SIMILARITY. 


FT 


DISULFID 


1345 


1362 


BY SIMILARITY. 


FT 


DISULFID 


403 


418 


BY SIMILARITY. 


FT 


DISULFID 


1364 


1373 


BY SIMILARITY. 


FT 


DISULFID 


420 


429 


BY SIMILARITY. 












FT 


DISULFID 


436 


447 


BY SIMILARITY. 


Note: remainder of annotations omitted, 


FT 


DISULFID 


441 


456 


BY SIMILARITY. 












FT 


DISULFID 


458 


467 


BY SIMILARITY. 


Query Match 




6.4%; 


Score 719; DB 1; Length 2318; 


FT 


DISULFID 


474 


485 


BY SIMILARITY. 


Best Local Similarity 31,21; 


Pred. No, 4.93e-124; 


FT 


DISULFID 


479 


494 


BY SIMILARITY. 


Matches 168; Conservative 


100; Mismatches 229; Indels 42; Gaps 35; 


FT 


DISULFID 


496 


505 


BY SIMILARITY, 












FT 


DISULFID 


512 


523 


BY SIMILARITY. 


Db 


161 DECRSGTTCRHGGTCLNTPGSF-RCQCPLGYTGLLCENPWPCAPSPCRNGGTC- -RQSS 217 


FT 


DISULFID 


517 


532 


BY SIMILARITY, 




: 1 1 


h: 


III : 1 


II II 1: 1 1: 1: :| ::||::|||| ::: 


FT 


DISULFID 


534 


543 


BY SIMILARITY. 


Qy 


916 NPCLSNP-CKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGE 974 


FT 


DISULFID 


550 


560 


BY SIMILARITY. 












FT 


DISULFID 


555 


569 


BY SIMILARITY. 


Db 


218 DVTYDCACLPGFEGQNCEVNVDDCPGHRCLNGGTCVDGVNTYNCQCPPEWTGQFCTEDVD 277 


FT 


DISULFID 


571 


580 


BY SIMILARITY. 






1 1 II 


Ihllllll 


II : 1 1 :IMII:| 1 1 MM ll::| 1 :| 


FT 


DISULFID 


587 


598 


BY SIMILARITY, 


Qy 


975 EDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLD 1034 


FT 


DISULFID 


592 


607 


BY SIMILARITY. 










FT 


DISULFID 


609 


618 


BY SIMILARITY. 


Db 


278 ECQLQPNACHNGGTCFNLLGGHSCVCVNGWTGESCSQNIDDCATAVCFHGATCHDRVASF 337 


FT 


DISULFID 


625 


635 


BY SIMILARITY. 




1 : 


1:1:: 


: 1: 1 


II 1 III ::lll 1 :|| 1 1 1 :: 


FT 


DISULFID 


630 


644 


BY SIMILARITY. 


Qy 


1035 FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 
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Db 338 YCACPMGKTGLLCHLDDACV- - - SNPCHEDAICDTNP - - - VS - GRAICTC PPGFTGGACD 390 



i ii i • ii • i • .1 ..ii.. I • ; I ■ 1 1 | ll : | | : 
Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPC-DNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 

Db 391 QDVDECSIGANPCEHLGRC-VNTQGSFLCQCGRGY-TGPR-CETDVNECLSGPCRNQA-T 446 

I I :: :: M :: I : :| | : | : : 
Oy 1154 KLVSVNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRVRA 1213 

Db 447 CLDRIGQFIC-IC-MAGFT-GTYCEVDI-DECQS-SPCVKGGVCKDRVN-GFSCT-C-PS 498 

I : I : : I : I:: II I |:|| I I : I | 
Qy 1214 SYDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSTLNFDS 1273 

Db 499 G - FSGSMC - QLDVDECAST PCRNGAKCVD -QPDG YECRCAEGFEGT LCERNV- DDCSP - D 553 

: 1:1 :| :| :||: : I : |: : : | | 
Qy 1274 PLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPGCEPCH 1333 

554 P-CHHGRC-VDGIASFSCACAPGYTGIRCESQVDE-CRSQPCRYGGKCLDLVD-KYLCR 608 
W I II I : l:|:| I I I I : : : : I : | | 1 1 : I: 

^y 1334 KKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 1392 

Db 609 CPPGTTGVNCEVNID - - D - CASNPCTFGVCR - DG I NR - YDCVCQPG FTG PLCNVEINEC 662 

I I M:: : I ; I I : : I I :|:|| '|: ||: 
Qy 1393 CLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPY-CECSSGYTGDSCDREIS-C 1449 



RESULT 5 

ID BOTCJEWA STANDARD; PRT; 2524 AA. 

AC P21783; 

DT 01-HAY-1991 (REL, 18, CREATED) 

DT 01-OCM996 (REL. 34, LAST SEQUENCE UPDATE) 

DT 15 -JUL- 1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG PRECURSOR (XOTCH PROTEIN) . 

GN XOTCH. 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS, 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 90385285. 

RA COFFMAN C, HARRIS W., KINTNER C; 

RT "Xotch, the Xenopus horaolog of Drosophila notch."; 

RL SCIENCE 249:1438-1441(1990). 

RN [2] 

RP REVISIONS TO 1759*1782. 

• KINTNER C; 
SUBMITTED (JUN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 
•!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 

CC -!- DEVELOPMENTAL STAGE; EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseiisb-sib.ch). 

CC 

DR EMBL; M33874; G1364263; -. 

DR PIR; A35844; A35844. 

DR PROSITE; PS00010; ASXJYDROXYL; 23, 

DR PROSITE; PS00022; EGF.l; 34. 

DR PROSITE; PS01186; EGF_2; 29, 

DR PROSITE; PS01187; EGFlCA; 21. 

DR PFAM; PF00008; EGF; 36. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA, 

KW ■ DIFFERENTIATION; NEUROGENESIS ; REPEAT; ANK REPEAT; EGF -LIKE DOMAIN; 



KW 


TRANSMEMBRANE; SIGNAL; 




FT 


SIGNAL 


2 




POTENTIAL, 


FT 


CHAIN 


20 


2524 




FT 


DOMAIN 


20 


1728 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1729 


1750 


POTENTIAL. 


FT 


DOMAIN 


1751 


2524 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


57 


EGF -LIKE 1. 


FT 


DOMAIN 


58 


99 


EGF -LIKE 2. 


FT 


DOMAIN 


102 


140 


EGF -LIKE 3. 


FT 


DOMAIN 


141 


177 


EGF -LIKE 4. 


FT 


DOMAIN 


179 


215 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


217 


254 


EGF-LIKE 6. 


FT 


DOMAIN 


256 


292 


■ EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


294 


332 


EGF-LIKE 8; CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


371 


409 


EGF-LIKE 10. 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


489 


525 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


527 


563 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


565 


600 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


602 


638 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


640 


675 


EGF-LIKE 17, 


FT 






III 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 




715 


750 


hbt but ID, UUillUM -BINUINt) (POTENTIAL). 


FT 


DOMAIN 


752 


788 


tut blA£i LtUiLlUM DlNUirllj ( HUlLNi Lnh) . 


FT 


DOMAIN 


790 


826 




FT 


DOMAIN 


828 


866 


EGF-LIKE 22, 


FT 


IY1VATN 


868 




EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DflYATM 


906 


QAl 




FT 


nnvATH 




980 


tut blfvL id, UUA.XUMd1NUJ.HIj (rUlMillAL) . 


FT 


Wflnill 


982 




Lor jjimj 40 . 




DOMAIN 


1020 


1056 


EGF "LIKE 27, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1058 


1094 


EGF-LIKE 28. 


FT 


DOMAIN 


1096 


1142 


EGF-LIKE 29. 


FT 


DOMAIN 


1144 


1180 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


1182 


1218 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


1220 


1264 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


1266 


1304 


EGF-LIKE 33, 


FT 


DOMAIN 


1306 


1346 


EGF-LIKE 34. 


FT 


DOMAIN 


1347 


1383 


EGF-LIKE 35. 


FT 


DOMAIN 


1386 


1424 


EGF-LIKE 36. 


FT 


DOMAIN 


1441 


1560 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1441 


1478 


LIN/NOTCH 1, 


FT 


REPEAT 


1479 


1520 


LIN/NOTCH 2, 


FT 


REPEAT 


1521 


1560 


LIN/NOTCH 3. 


FT 


DOMAIN 


1871 


2083 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


22 


35 


BY SIMILARITY. 


FT 


DISULFID 


29 


45 


BY SIMILARITY. 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 


FT 


DISULFID 


62 


74 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY SIMILARITY. 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


FT 


DISULFID 


111 


128 


BY SIMILARITY. 


FT 


DISULFID 


130 


139 


BY SIMILARITY. 


FT 


DISULFID 


145 


156 


BY SIMILARITY. 


FT 


DISULFID 


150 


165 


BY SIMILARITY. 


FT 


DISULFID 


167 


176 


BY SIMILARITY. 


FT 


DISULFID 


183 


194 


BY SIMILARITY. 


FT 


DISULFID 


188 


203 


BY SIMILARITY. 


FT 


DISULFID 


205 


214 


BY SIMILARITY. 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 


FT 


DISULFID 


226 


242 


BY SIMILARITY. 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT 


DISULFID 


260 


271 


BY SIMILARITY. 


FT 


DISULFID 


265 


280 


BY SIMILARITY. 


FT 


DISULFID 


282 


291 


BY SIMILARITY. 


FT 


DISULFID 


298 


311 


BY SIMILARITY. 


FT 


DISULFID 


305 


320 


BY SIMILARITY. 


FT 


DISULFID 


322 


331 


BY SIMILARITY. 


FT 


DISULFID 


338 


349 


BY SIMILARITY. 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 
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FT 


DISULFID 


360 


369 


BY SIMILARITY. 


FT 


DISULFID 1351 1362 BY SIMILARITY. 


FT 


DISULFID 


375 


386 


BY SIMILARITY. 


FT 


DISULFID 1356 1371 BY SIMILARITY. 


FT 


DISULFID 


380 


397 


BY SIMILARITY. 


FT 


DISULFID 1373 1382 BY SIMILARITY. 


FT 


DISULFID 


399 


408 


BY SIMILARITY. 


FT 


DISULFID 1390 1401 BY SIMILARITY. 


FT 


DISULFID 


415 


428 


BY SIMILARITY. 


FT 


DISULFID 1395 1412 BY SIMILARITY. 


FT 


DISULFID 


422 


437 


BY SIMILARITY. 


FT 


DISULFID 1414 1423 BY SIMILARITY. 


FT 


DISULFID 


439 


448 


BY SIMILARITY. 


FT 


CARBOHYD 462 462 POTENTIAL. 


FT 


DISULFID 


455 


466 


BY SIMILARITY. 


FT 


CARBOHYD 887 887 POTENTIAL. 


FT 


DISULFID 


460 


475 


BY SIMILARITY, 






FT 


DISULFID 


477 


486 


BY SIMILARITY. 


Note: remainder of annotations omitted. 


FT 


DISULFID 


493 


504 


BY SIMILARITY. 






FT 


DISULFID 


498 


513 


BY SIMILARITY. 


Query Match 6,3*; Score 714; DB 1; Length 2524; 


FT 


DISULFID 


515 


524 


BY SIMILARITY. 


Best Local Similarity 31.1%; Pred. No. 6,53e-123; 


FT 


DISULFID 


531 


542 


BY SIMILARITY, 


Matches 170; Conservative 105; Mismatches 227; Indels 45; Gaps 34 


FT 


DISULFID 


536 


551 


BY SIMILARITY. 






FT 


DISULFID 


553 


562 


BY SIMILARITY, 


Db 


181 NECSQNPCKNGGQCINE - FGSYRCTCQNRFTGRNCDEPYVPCNPSPCLNGGTC - - RQTDD 237 


FT 


DISULFID 


569 


579 


BY SIMILARITY. 




1 I Mill 1 1 :: HIM 1 |::M 1 :| ::|| :|||| ;: :: 


FT 


DISULFID 


574 


588 


BY SIMILARITY. 


Qy 


916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 


FT 


DISULFID 


590 


599 


BY SIMILARITY. 






FT 


DISULFID 


606 


617 


BY SIMILARITY, 


Db 


238 TSYDCTCLPGFSGQNCEENIDDCPSNNCRNGGTCVDGVNTYNCQCPPDWTGQYCTEDVDE 297 




DISULFID 


611 


626 


BY SIMILARITY. 




'■■ 1 1 II hill hill hi 1 :||lll:| 1 1 III: II: 1 1 :| 


■ 


DISULFID 


628 


637 


BY SIMILARITY, 


Qy 


976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 


w 


DISULFID 


644 


654 


BY SIMILARITY, 




FT 


DISULFID 


649 


663 


BY SIMILARITY. 


Db 


298 CQLMPNACQNGGTCHNTYGGYNCVCVNGWTGEDCSENIDDCANAACHSGATCHDRVASFY 357 


FT 


DISULFID 


665 


674 


BY SIMILARITY. 




1 Ml: : 1 1 l::| 1 1 II 1 ::lll : 1 :|| 1 1 1 :: 


FT 


DISULFID 


681 


692 


BY SIMILARITY. 


Qy 


1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 


FT 


DISULFID 


686 


701 


BY SIMILARITY. 






FT 


DISULFID 


703 


712 


BY SIMILARITY. 


Db 


358 CECPHGRTGLLCHLDNA-CI - -SNPCNEGSNCDTNP ■ - ■ VN-GKAICTCPPGYTGPACNN 4 10 


FT 


DISULFID 


719 


729 


BY SIMILARITY. 




1 II 1 Mhl : : : ::ll : :|: : 1 :|| 1 III 1 |:: 


FT 


DISULFID 


724 


738 


BY SIMILARITY. 


Qy 


1096 CICPEGYSGLFCEFSPPMVLPRTSPC-DNFDCQNGAQCIVRINEPICQCLPGYQGEKCEK 1154 


FT 


DISULFID 


740 


749 


BY SIMILARITY. 




FT 


DISULFID 


756 


767 


BY SIMILARITY. 


Db 


411 DVDECSLGANPC-E-HGGRC-TNTLGSFQCNCPQ--G—YAGPRCEIDVNECLSNPCQN 462 


FT 


DISULFID 


761 


776 


BY SIMILARITY. 




1 ::::::: I ::| : I 1 1 : 1 |: : : 


FT 


DISULFID 


778 


787 


BY SIMILARITY. 


Qy 


1155 LVSVNFINRESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGRVRAS 1214 


FT 


DISULFID 


794 


805 


BY SIMILARITY. 




FT 


DISULFID 


799 


814 


BY SIMILARITY, 


Db 


463 ■DSTCLDQIGEFQCICMPGYEGLYCET-NIDECASNPC-LHNGKCIDKIN-E--FRCDCP 516 


FT 


DISULFID 


816 


825 


BY SIMILARITY. 




1: : : : : 1 :|: 1 : I 1 I ::: : : I I 


FT 


DISULFID 


832 


843 


BY SIMILARITY. 


Qy 


1215 YDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSTLNFDSP 1274 


FT 


DISULFID 


837 


854 


BY SIMILARITY . 






FT 


DISULFID 


856 


865 


BY SIMILARITY. 


Db 


517 TGFSGNLC-QHDFDECTSTPCKNGAKCLDG--PNSYTCQCTEGFTGRHCEQDI-NECIP- 571 


FT 


DISULFID 


872 


883 


BY SIMILARITY. 




: 1 : : :| 11= : 1 1 1 : 1 : 1 II 


FT 


DISULFID 


877 


892 


BY SIMILARITY. 


Qy 


1275 L-YVGGMPGKSNVASLRQAPGQNGTS-FHGCIRNLYINSELQDFQKVPMQTGILPGCEPC 1332 


FT 


DISULFID 


894 


903 


BY SIMILARITY. 






FT 


DISULFID 


910 


921 


BY SIMILARITY, 


Db 


572 DP--CHYGTCK-DGIATFTCLCRPGYTGRLCDNDINE-CLSKPCLNGGQCTDREN-GYIC 626 


FT 


DISULFID 


915 


930 


BY SIMILARITY. 




1 III : 1 III 1: 1 1 III h II:: |::| 1 : :| I 


FT 


DISULFID 


932 


941 


BY SIMILARITY. 


Qy 


1333 HKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGT-CLPINAFSYSC 1391 


FT 


DISULFID 


986 


997 


BY SIMILARITY. 




FT 


DISULFID 


991 


1006 


BY SIMILARITY. 


Db 


627 TCPKGTTGVNCETKID--D-CASNLCDNGKC-IDKI-DGYECTCEPGYTGKLCNININEC 681 


FT 


DISULFID 


1008 


1017 


BY SIMILARITY, 




1 1 II 1: 1 : 1 : 1 :|ll : : : 1 1 1 :|||| |: :|: 1 


ft 


DISULFID 


1024 


1035 


BY SIMILARITY. 


Qy 


1392 KCLEGHGGVLCDEEEDLFNPCQAIRCKHGKCRLSGLGQPY-CECSSGYTGDSCDREIS-C 1449 


■ 


DISULFID 


1029 


1044 


BY SIMILARITY. 




n 


DISULFID 


1046 


1055 


BY SIMILARITY. 


Db 


682 DSNPCRN 688 


FT 


DISULFID 


1062 


1073 


BY SIMILARITY. 




I: 


FT 


DISULFID 


1067 


1082 


BY SIMILARITY. 


Qy 


1450 RGERIRD 1456 


FT 


DISULFID 


1084 


1093 


BY SIMILARITY, 




FT 


DISULFID 


1100 


1121 


BY SIMILARITY, 






FT 


DISULFID 


1115 


1130 


BY SIMILARITY. 


RESULT 6 


FT 


DISULFID 


1132 


1141 


BY SIMILARITY. 


ID 


NTC1JAT STANDARD; PRT; 2531 AA. 


FT 


DISULFID 


1148 


1159 


BY SIMILARITY, 


AC 


Q07008; 


FT 


DISULFID 


1153 


1168 


BY SIMILARITY, 


DT 


01-NOV-1995 (REL. 32, CREATED) 


FT 


DISULFID 


1170 


1179 


BY SIMILARITY. 


DT 


01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 


FT 


DISULFID 


1186 


1197 


BY SIMILARITY, 


DT 


15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 


FT 


DISULFID 


1191 


1206 


BY SIMILARITY. 


DE 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR, 


FT 


DISULFID 


1208 


1217 


BY SIMILARITY. 


GN 


NOTCH1, 


FT 


DISULFID 


1224 


1243 


BY SIMILARITY, 


OS 


RATTUS NORVEGICUS (RAT). 


FT 


DISULFID 


1237 


1252 


BY SIMILARITY. 


OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 


FT 


DISULFID 


1254 


1263 


BY SIMILARITY. 


OC 


RODENT IA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 


FT 


DISULFID 


1270 


1283 


BY SIMILARITY. 


RN 


[1] 


FT 


DISULFID 


1275 


1292 


BY SIMILARITY. 


RP 


SEQUENCE FROM N.A. 


FT 


DISULFID 


1294 


1303 


BY SIMILARITY, 


RC 


TISSUE-SCHWANN CELL; 


FT 


DISULFID 


1310 


1321 


BY SIMILARITY, 


RX 


MEDLINE; 92111383. 


FT 


DISULFID 


1315 


1333 


BY SIMILARITY. 


RA 


WEINMASTER G., ROBERTS V.J., LEMKE G.; 


FT 


DISULFID 


1335 


1344, 


BY SIMILARITY. 


RT 


"A homolog of Drosophila Notch expressed during mammalian 
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RT 


development."; 






FT 


DOMAIN 


1449 


1462 


CYS-RICH. 


RL 


DEVELOPMENT 113 : 199 - 205 ( 1991 ) . 


FT 


DOMAIN 


1865 


2076 


6 X ANK MOTIF REPEATS. 


CC 


•!• FUNCTION: REC 


UIRED FOR THE CORRECT DIFFERENTIATION OF A NUMBER 


FT 


REPEAT 


1865 


1910 


ANK MOTIF 1. 


cc 


OF TISSUES. 






FT 


REPEAT 


1912 


1942 


ANK MOTIF 2. 


CC 


-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 


FT 


REPEAT 


1944 


1975 


ANK MOTIF 3. 


cc 


-1- DEVELOPMENTAL STAGE: IN THE EMBRYO, HIGHEST LEVELS OCCUR BETWEEN 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4. 


cc 


DAYS 12 AND 14 AND DECREASE RAPIDLY TO MUCH LOWER LEVELS IN THE 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


cc 


ADULT. 






FT 


REPEAT 


2044 


2076 


ANK MOTIF 6. 


cc 


-!• SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 


FT 


DISULFID 


24 


37 


BY SIMILARITY. 


cc 


•!• SIMILARITY: CONTAINS 36 


EGF-LIKE DOMAINS. 


FT 


DISULFID 


31 


46 


BY SIMILARITY. 


cc 


-!■ SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 


FT 


DISULFID 


48 


57 


BY SIMILARITY. 


cc 


-!■ SIMILARITY: CONTAINS 6 ANK REPEATS. 


FT 


DISULFID 


63 


74 


BY SIMILARITY, 


cc 










FT 


DISULFID 


68 


87 


BY SIMILARITY, 


cc 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


cc 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation ■ 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


cc 


the European Bioinformatics Institute. There are no restrictions on its 


FT 


DISULFID 


111 


127 


BY SIMILARITY. 


£ 


use by 


non-profit institutions as long as its content is in no way 


FT 


DISULFID 


129 


138 


BY SIMILARITY, 


1 


modified and this statement is not removed. Usage by and for commercial 


FT 


DISULFID 


144 


155 


BY SIMILARITY. 


1 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


FT 


DISULFID 


149 


164 


BY SIMILARITY. 




or send 


an email 


to licenseSisb-sib.ch). 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 


cc 










FT 


DISULFID 


182 


195 


BY SIMILARITY. 


DR 


EMBL; X57405; G57635; -. 




FT 


DISULFID 


189 


204 


BY SIMILARITY. 


DR 


PROSITE; PS00010, 


ASXJYDROXYL; 22. 


FT 


DISULFID 


206 


215 


BY SIMILARITY. 


DR 


PROSITE; PS00022; BGPJL; 35. 




FT 


DISULFID 


222 


233 


BY SIMILARITY. 


DR 


PROSITE; PS01186 


EGFJ; 26. 




FT 


DISULFID 


227 


243 


BY SIMILARITY. 


DR 


PROSITE; 


PS01187; EGF_CA; 21. 


FT 


DISULFID 


245 


254 


BY SIMILARITY. 


DR 


PFAM; PF 


00008; EGF; 35. 




FT 


DISULFID 


261 


272 


BY SIMILARITY, 


DR 


PFAM; PF00023; ank; 6. 




FT 


DISULFID 


266 


281 


BY SIMILARITY, 


DR 


PFAM; PF00066; notch; 3. 




FT 


DISULFID 


283 


292 


BY SIMILARITY, 


DR 


HSSP; P00740; 1IXA. 




FT 


DISULFID 


299 


312 


BY SIMILARITY. 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


FT 


DISULFID 


306 


321 


BY SIMILARITY. 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


FT 


DISULFID 


323 


332 


BY SIMILARITY, 


FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


DISULFID 


339 


350 


BY SIMILARITY. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 


FT 


DISULFID 


344 


359 


BY SIMILARITY. 


FT 


DOMAIN 


19 


1723 


EXTRACELLULAR (POTENTIAL). 


FT 


DISULFID 


361 


370 


BY SIMILARITY. 


FT 


TRANSMEM 


1724 


1746 


POTENTIAL. 


FT 


DISULFID 


376 


387 


BY SIMILARITY. 


ft' 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT 


DISULFID 


381 


398 


BY SIMILARITY. 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DISULFID 


400 


409 


BY SIMILARITY. 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2. 


FT 


DISULFID 


416 


429 


BY SIMILARITY, 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3. 


FT 


DISULFID 


423 


438 


BY SIMILARITY. 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DISULFID 


440 


449 


BY SIMILARITY. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


456 


467 


BY SIMILARITY. 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DISULFID 


461 


476 


BY SIMILARITY. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


478 


487 


BY SIMILARITY, 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


494 


505 


BY SIMILARITY. 


| 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


499 


514 


BY SIMILARITY. 


■ 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DISULFID 


516 


525 


BY SIMILARITY. 


m 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


532 


543 


BY SIMILARITY. 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


537 


552 


BY SIMILARITY. 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


554 


563 


BY SIMILARITY. 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


570 


580 


BY SIMILARITY. 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


575 


589 


BY SIMILARITY. 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


591 


600 


BY SIMILARITY. 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


607 


618 


BY SIMILARITY, 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


612 


627 


BY SIMILARITY. 


FT 


DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


629 


638 


BY SIMILARITY. 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


645 


655 


BY SIMILARITY. 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


650 


664 


BY SIMILARITY. 


FT 


DOMAIN 


829 


867 


EGF-LIKE 22. 


FT 


DISULFID 


666 


675 


BY SIMILARITY. 


FT 


DOMAIN 


869 


905 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


682 


693 


BY SIMILARITY. 


FT 


DOMAIN 


907 


943 


EGF-LIKE 24. 


FT 


DISULFID 


687 


702 


BY SIMILARITY, 


FT 


DOMAIN 


945 


981 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


704 


713 


BY SIMILARITY. 


FT 


DOMAIN 


983 


1019 


EGF-LIKE 26. 


FT 


DISULFID 


720 


730 


BY SIMILARITY. 


FT 


DOMAIN 


1021 


1057 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


725 


739 


BY SIMILARITY. 


FT 


DOMAIN 


1059 


1095 


EGF-LIKE 28. 


FT 


DISULFID 


741 


750 


BY SIMILARITY. 


FT 


DOMAIN 


1097 


1143 


EGF-LIKE 29. 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


FT 


DOMAIN 


1145 


1181 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


762 


777 


BY SIMILARITY. 


FT 


DOMAIN 


1183 


1219 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


779 


788 


BY SIMILARITY. 


FT 


DOMAIN 


1221 


1265 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


795 


806 


BY SIMILARITY. 


FT 


DOMAIN 


1267 


1305 


EGF-LIKE 33. 


FT 


DISULFID 


800 


815 


BY SIMILARITY. 


FT 


DOMAIN 


1307 


1346 


EGF-LIKE 34. 


FT 


DISULFID 


817 


826 


BY SIMILARITY. 


FT 


DOMAIN 


1348 


1384 


EGF-LIKE 35. 


FT 


DISULFID 


833 


844 


BY SIMILARITY, 


FT 


DOMAIN 


1387 


1426 


EGF-LIKE 36. 


FT 


DISULFID 


838 


855 


BY SIMILARITY, 
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FT 


DISULFID 


857 


866 


RV CTMTT ARTTV 
DI OlMlLnKll I . 




DISULFID 


873 


884 


BY SIMILARITY. 


FT 


DISULFID 


878 


893 


RY CTMTT.ARTTY 
DI DlMlLnKll I > 


FT 


DISULFID 


895 


904 


BY SIMILARITY, 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 




DISULFID 


916 


931 


BY SIMILARITY, 


FT 


DISULFID 


933 


942 


BY SIMILARITY, 


FT 


DISULFID 


987 


998 


BY SIMILARITY. 




DISULFID 


992 


1007 


RV CTMUAPTTV 
DI 01MlLni\ll I . 


FT 


DISULFID 


1009 


1018 


RV CTVTT.aPTTV 


FT 


DISULFID 


1025 


1036 


RY QTVTT.aPTTY 


FT 




1030 


1045 


RV CTVTTADTTV 
DI SlMlLnKllI. 


FT 


DISULFID 


1047 


1056 


RY CTMTT.APTTY 
DI SlWlLniUlI , 




DISULFID 


1063 


1074 


RV CTVTTiPTTV 
DI SlMibTOUlI, 


FT 




1068 


1083 


RV CTMTT BDTTV 
DI DlMlLnKllI, 


FT 


DTcrn.PTn 


1085 


1094 


RV CTMTT BPTTV 
DI DlMlbftKUI, 


FT 


DISULFID 


1101 


1122 


RV CTMTT 1PTTV 
DI 01MlbnKi.lI , 


FT 


DISULFID 


1116 


1131 


RV CTMTT ARTTV 


FT 


DISULFID 


1133 


1142 


RV <3TMTT,ARTTV 

DI OlmLnKll I . 


FT 


DISULFID 


1149 


1160 


BY SIMILARITY. 




DISULFID 


1154 


1169 


BY SIMILARITY. 


n 


DISULFID 


1171 


1180 


RY C.TMTT ARTTV 
DI DiMlL/mll I . 


p 


DISULFID 


1187 


1198 


RY STVUiARTTY 

DI dlniLlftAll 1 . 


FT 


DISULFID 


1192 


1207 


BY SIMILARITY, 


FT 


DISULFID 


1209 


1218 


RY <!TMTT,ARTTY 

Dl OlWlLinlvll I , 


FT 


DISULFID 


1225 


1244 


BY SIMILARITY, 


FT 


DISULFID 


1238 


1253 


RY CTMTT.APTTY 
DI DimitniVll I , 


FT 


DISULFID 


1255 


1264 


BY SIMILARITY, 


FT 


DISULFID 


1271 


1284 


BY SIMILARITY, 


FT 


DISULFID 


1276 


1293 


BY SIMILARITY. 


FT 


DISULFID 


1295 


1304 


BY SIMILARITY, 


FT 


DISULFID 


1311 


1322 


BY SIMILARITY. 


FT 


DISULFID 


1316 


1334 


BY SIMILARITY. 


FT 


DISULFID 


1336 


1345 


BY SIMILARITY. 


FT 


DISULFID 


1352 


1363 


BY SIMILARITY. 


FT 


DISULFID 


1357 


1372 


BY SIMILARITY. 


FT 


DISULFID 


1374 


1383 


BY SIMILARITY, 


FT 


DISULFID 


1391 


1403 


BY SIMILARITY. 



Note: remainder of annotations omitted. 

Query Match 6.3%; Score 708; DB 1; Length 2531; 

Best Local Similarity 29,1*; Pred. No, 1.45e-121; 

Matches 161; Conservative 116; Mismatches 229; Indels 48; Gaps 38; 

Db 420 ANPCEHAGKCL-NTLGSFECQCLQGYTGPRCEIDVNECISNPCONDATC-L-D-QIGEFQ 475 

:IM : I I : : : I I |: I |:: :: llllll : :|| | : : I 
Qy 920 SNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFW 979 

»476 CICMPGYEGVYCEINTDECASSPCLHNGRCVDKINEFLCQCPKGFSGHLCQYDVDECA-S 534 
III hi! 1:1 |:| ; I :|: III ||:; I || ::| ||: :| I 
980 CICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 535 -TPCKNGAKCLDGPNTYTCVCTEGYTGTHCEVDIDECDPDPCHIGL-CKDGVATFTCLCO 592 

II : :lh |: : I II II I l|::|:|:|: : I I I |:| :||:| 
Qy 1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

Db 593 PGYTGHHCE-TN- • I - - vNECHSQPCRHGGTCQDRDNYYLCLCLKGTTGPNCEINLDDC 645 

Ihl II : : : I : |::|: I II :| II I I :|l :| 
Qy 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE-KL-V 1156 

Db 646 ASNPCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPCHNGGTCEDGIAGFTCRCP 705 

: I : : I: I : : I : : II :| :| :: : I 
Qy 1157 SVNFINKESYLQ-IPSAKVRPQTNIT--L-QIATDEDSGILLYKGDKDHIAVELYRGRVR 1212 

Db 706 EGYHDPTCLSEVNECNSNPCIHGACRD-GLNGYKCDCAPGWSGTNCDINNNEC-ESNPCV 763 

:| I : : : I : I : : : I I I I :| 
Qy 1213 ASY-DTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSTLNF 1271 

Db 764 NGGTCKD-MTS-GYVCTCREGFSGPNCQTNINECASNPCLNQGTCIDDVAGYKCNCPLPY 821 

:: . I ; : I : I:: :| | |::: I I :| : ::| : : |: 
Qy 1272 DSPLYVGGMPGKSNVASLRQA-PGQN-GTSFHGCIRNLYIN-SE-LQD--FQ-KVPMQ- 1322 



Db 822 TGATCEWLAPCATSPCKNSGVCRESEDYESFSCVCPTGWQGQTCEIDINE-CVKSPCRH 880 

II II I : I I I ; :h Mill: |: |: : I I 

Qy 1323 TGILPGC--EPCHRKVCAH-GTCQPSSQ-AGFTCECQEGWMGPLCDQRTNDPCLGNKCVH 1378 

Db 881 GASCQNTNG-SYRCLCQAGYTGRNCESDID--D-CRPNPCHNGGSCT-DGVNAAFCDCLP 935 

1:1 I: III II I |: : I : |:: I :| I |: ::|:| : 
Qy 1379 GT - CLPI NAFS YSC KC LEGHGGVLCDEEEDLFNPCQAIKCKHG - KCRLSGLGQ P YC EC SS 1436 

Db 936 GFQGAFCEEDINEC 949 

I: I I: :!: I 
Qy 1437 GYTGDSCDREIS-C 1449 



RESULT 7 

ID NOTC_DROME STANDARD; PRT; 2703 AA. 

AC P07207; P04154; 

DT 01-NOV-1986 (REL, 03, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN PRECURSOR, 

GN N. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY) . 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDRQIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 86079539. 

RA WHARTON K.A., JOHANSEN K.M., XC T., ARTAVANIS'TSAKONAS S.; 

RT "Nucleotide sequence from the neurogenic locus notch implies a gene 

RT product that shares homology with proteins containing EGF-like 

RT repeats."; 

RL CELL 43:567-581(1985). 

RN [2] 

RP SEQUENCE FROM N.A, 

RC STRAIN-OREGON-R; 

RX MEDLINE; 87064624. 

RA KIDD S., KELLEY M.R., YOUNG M.W.; 

RT "Sequence of the notch locus of Drosophila melanogaster: relationship 

RT of the encoded protein to mammalian clotting and growth factors."; 

RL MOL. CELL. BIOL. 6:3094-3108(1986). 

RN [3] 

RP SEQUENCE OF 2505-2611 FROM N.A. 

RX MEDLINE; 85099329. 

RA WHARTON K.A., YEDVOBNICK B., FINNERTY V.G., ARTAVANIS'TSAKONAS S.; 

RT "opa: a novel family of transcribed repeats shared by the Notch locus 

RT and other developmental^ regulated loci in D, melanogaster,"; 

RL CELL 40:55-62(1985). 

RN [4] 

RP SEQUENCE OF 1-8 FROM N.A. 

RX MEDLINE; 87257846. 

RA KELLEY M.R., KIDD S,, BERG R.L., YOUNG M.W.; 

RT "Restriction of P-element insertions at the Notch locus of Drosophila 

RT melanogaster."; 

RL MOL, CELL. BIOL, 7:1545-1548(1987), 

RN [5] 

RP REVIEW. 

RA HARRIS W.A.; 

RT "Many cell types specified by Notch function."; 

RL CURR, BIOL, 1:120-122(1991). 

CC -I- FUNCTION: NOTCH PROTEIN IS ESSENTIAL FOR PROPER DIFFERENTIATION OF 
CC ECTODERM. 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

CC ■ I - SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 

CC OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 

CC THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS, 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS, 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 
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the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Osage by and for commercial 
entities requires a license agreement (See http://wwv.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 

EMBL; M16152; G157988; *. 
EMBL; M16153; G157988; JOINED. 
EMBL; M16149; G157988; JOINED. 
EMBL; M16150; G157988; JOINED, 
EMBL; H16151; G157988; JOINED, 
EMBL; K03508; G157993; -. 
EMBL; M13689; G157993; JOINED. 
EMBL; K03507; G157993; JOINED. 
EMBL; M12175; G950317; -. 
EMBL; M16025; G157995; •. ■■ 
PIR; A24420; A24420. 
PIR; A24768; A24768. 
PIR; A05267; A05267. 
FLYBASE; FBgn0004647; N. 
PROSIIE; PS00010; ASXJYDROXYL; 22, 
PROSITE; PS00022; EGF.l; 34. 
PROSITE; PS01186; EGF_2; 28. 
PROSITE; PS01187; EGF CA; 22. 
PFAM; PF00008; EGF; 36. 
PFAM; PF00023; ank; 6, 
PFAM; PF00066; notch; 3. 
HSSP; P00740; 1IXA. 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 

POTENTIAL. 

NEUROGENIC LOCDS NOTCH PROTEIN. 
EXTRACELLULAR (POTENTIAL). 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL). 
36 X EGF-TYPE REPEATS, 
EGF-LIKE 1. 
EGF-LIKE 2. 
EGF-LIKE 3. 
EGF-LIKE 4. 

EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 6. 

EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 10. 

EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL), 
EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) , 
EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 
EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL), 
EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL), 
EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) , 
EGF-LIKE 22, 

EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL), 
EGF-LIKE 25. 

EGF-LIKE 26, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 27. 
EGF-LIKE 28. 
EGF-LIKE 29, 

EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) , 
EGF-LIKE 33. 
EGF-LIKE 34, 
EGF-LIKE 35. 
EGF-LIKE 36. 
3 X LIN/NOTCH REPEATS. 



FT 


SIGNAL 


1 


44 


FT 


CHAIN 


45 


2703 




DOMAIN 


45 


1745 


FT 


TRANSMEM 


1746 


1766 


FT 


DOMAIN 


1767 


2703 


FT 


DOMAIN 


58 


1451 


FT 


DOMAIN 


58 


95 


FT 


DOMAIN 


'96 


136 


FT 


DOMAIN 


139 


176 


FT 


DOMAIN 


177 


215 


FT 


DOMAIN 


217 


253 


FT 


DOMAIN 


255 


291 


FT 


DOMAIN 


293 


329 


FT 


DOMAIN 


331 


370 


FT 


DOMAIN 


372 


408 




DOMAIN 


409 


447 


1 


DOMAIN 


449 


486 




DOMAIN 


488 


524 


FT 


DOMAIN 


526 


562 


FT 


DOMAIN 


564 


600 


FT 


DOMAIN 


602 


637 


FT 


DOMAIN 


639 


675 


FT 


DOMAIN 


677 


713 


FT 


DOMAIN 


715 


751 


FT 


DOMAIN 


753 


789 


FT 


DOMAIN 


791 


827 


FT 


DOMAIN 


829 


865 


FT 


DOMAIN 


867 


905 


FT 


DOMAIN 


907 


944 


FT 


DOMAIN 


946 


982 


FT 


DOMAIN 


984 


1020 


FT 


DOMAIN 


1022 


1058 


FT 


DOMAIN 


1060 


1096 


FT 


DOMAIN 


1098 


1134 


FT 


DOMAIN 


1136 


1181 


FT 


DOMAIN 


1183 


1219 


FT 


DOMAIN 


1221 


1257 


FT 


DOMAIN 


1259 


1295 


FT 


DOMAIN 


1297 


1335 


FT 


DOMAIN 


1337 


1373 


FT 


DOMAIN 


1375 


1412 


FT 


DOMAIN 


1415 


1451 


FT 


DOMAIN 


1475 


1593 



FT 


REPEAT 


1475 


1513 


LIN/NOTCH 1. 


FT 


REPEAT 


1514 


1553 


LIN/NOTCH 2. 


FT 


REPEAT 


1554 


1593 


LIN/NOTCH 3. 


FT 


DOMAIN 


1896 


2109 


6 X ANK MOTIF 


FT 


DOMAIN 


2538 


2568 


POLY-GLN (OPA- 


FT 


DISULFID 


62 


73 


BY SIMILARITY. 


FT 


DISULFID 


67 


83 


BY SIMILARITY. 


FT 


DISULFID 


85 


94 


BY SIMILARITY. 


FT 


DISULFID 


100 


111 


BY SIMILARITY. 


FT 


DISULFID 


105 


124 


BY SIMILARITY. 


FT 


DISULFID 


126 


135 


BY SIMILARITY. 


FT 


DISULFID 


143 


154 


BY SIMILARITY. 


FT 


DISULFID 


148 


164 


BY SIMILARITY. 


FT' 


DISULFID 


166 


175 


BY SIMILARITY. 


FT 


DISULFID 


181 


192 


BY SIMILARITY. 


FT 


DISULFID 


186 


203 


BY SIMILARITY. 


FT 


DISULFID 


205 


214 


BY SIMILARITY. 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 


FT 


DISULFID 


226 


241 


BY SIMILARITY, 


FT 


DISULFID 


243 


252 


BY SIMILARITY. 


FT 


DISULFID 


259 


270 


BY SIMILARITY. 


FT 


DISULFID 


264 


279 


BY SIMILARITY, 


FT 


DISULFID 


281 


290 


BY SIMILARITY. 


FT 


DISULFID 


297 


308 


BY SIMILARITY, 


FT 


DISULFID 


302 


317 


BY SIMILARITY. 


FT 


DISULFID 


319 


328 


BY SIMILARITY. 


FT 


DISULFID 


335 


349 


BY SIMILARITY. 


FT 


DISULFID 


343 


358 


BY SIMILARITY, 


FT 


DISULFID 


360 


369 


BY SIMILARITY. 


FT 


DISULFID 


376 


387 


BY SIMILARITY. 


FT 


DISULFID 


381 


396 


BY SIMILARITY, 


FT 


DISULFID 


398 


407 


BY SIMILARITY. 


FT 


DISULFID 


413 


424 


BY SIMILARITY, 


FT 


DISULFID 


418 


435 


BY SIMILARITY, 


FT 


DISULFID 


437 


446 


BY SIMILARITY. 


FT 


DISULFID 


453 


465 


BY SIMILARITY. 


FT 


DISULFID 


459 


474 


BY SIMILARITY. 


FT 


DISULFID 


476 


485 


BY SIMILARITY, 


FT 


DISULFID 


492 


503 


BY SIMILARITY. 


FT 


DISULFID 


497 


512 


BY SIMILARITY, 


FT 


DISULFID 


514 


523 


BY SIMILARITY. 


FT 


DISULFID 


530 


541 


BY SIMILARITY. 


FT 


DISULFID 


535 


550 


BY SIMILARITY. 


FT 


DISULFID 


552 


561 


BY SIMILARITY. 


FT 


DISULFID 


568 


579 


BY SIMILARITY. 


FT 


DISULFID 


573 


588 


BY SIMILARITY. 


FT 


DISULFID 


590 


599 


BY SIMILARITY. 


FT 


DISULFID 


606 


616 


BY SIMILARITY. 


FT 


DISULFID 


611 


625 


BY SIMILARITY. 


FT 


DISULFID 


627 


636 


BY SIMILARITY. 


FT 


DISULFID 


643 


654 


BY SIMILARITY. 


FT 


DISULFID 


648 


663 


BY SIMILARITY. 


FT 


DISULFID 


665 


674 


BY SIMILARITY. 


FT 


DISULFID 


681 


692 


BY SIMILARITY. 


FT 


DISULFID 


686 


701 


BY SIMILARITY. 


FT 


DISULFID 


703 


712 


BY SIMILARITY. 


FT 


DISULFID 


719 


730 


BY SIMILARITY. 


FT 


DISULFID 


724 


739 


BY SIMILARITY. 


FT 


DISULFID 


741 


750 


BY SIMILARITY. 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


FT 


DISULFID 


762 


777 


BY SIMILARITY. 


FT 


DISULFID 


779 


788 


BY SIMILARITY. 


FT 


DISULFID 


795 


806 


BY SIMILARITY. 


FT 


DISULFID 


800 


815 


BY SIMILARITY. 


FT 


DISULFID 


817 


826 


BY SIMILARITY, 


FT 


DISULFID 


833 


844 


BY SIMILARITY, 


FT 


DISULFID 


838 


853 


■ BY SIMILARITY, 


FT 


DISULFID 


855 


864 


BY SIMILARITY. 



Note: remainder of annotations omitted. 

Query Match 6.3%; Score 708; DB 1; Length 2703; 

Best Local Similarity 44.8%; Pred. No. 1.45e-121; 
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Matches 108; Conservative 37; Mismatches 82; Indels 14; Gaps 7; 

Db 490 NECESHPCQNEGSCLDDPGTF-RCVCMPGFTGTQCEIDIDECQSNPCLNDGTCH--D-KI 545 

I I MM 1:1:1 1 1 I II I III :|:: I I llll : till : 
Oy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 546 NGFKCSCALGFTGARCQINIDDCQSQPCRNRGICHDSIAGYSCECPPGYTGTSCEININD 605 

Ml I II II I |::|:IM: I I : I |:| |:| III III II ::: 
Oy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 606 C--DSNPCHR-GKCIDDVNSFKCLCDPGYTGYICQKQINECESNPCOFDGHCQDRVGSYY 662 

I I III:: :MI ::lll I III I |: ::::|: II :|| I I :| 
Oy 1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 

Db 663 CQCQAGTSGKNCE V--NVNECHSNPCNNGATCIDGINSYKCQCVPGFTGQHCEKN 715 

I I I II II I : I : I III II II 111:11: I: III 
Oy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKL 1155 

Db 716 V 716 

I 

Jk 1156 V 1156 



RESULT 8 

ID NTClJOOSE STANDARD; PRT; 2531 AA. 

AC Q017G5; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR (MOTCH PROTEIN) . 

GN NOTCH1 OR MOTCH. 

OS MUS MUSCULUS (MOUSE) . 

OC E0KARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RQDENTIA; SCIOROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93194170. 

RA FRANCO DEL AMO F., GENDRON-MAGUIRE M., SWIATEK P.J., JENKINS N.A., 

RA COPELAND N.G., GRIDLEY T,; 

RT "Cloning, analysis, and chromosomal localization of Notch-1, a mouse 

RT homolog of Drosophila Notch."; 

RL GENOMICS 15:259-264(1993). 

RN [2] 

RP SEQUENCE OF 1551-2170 FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93048835. 

RA FRANCO DEL AMO F., SMITH D.E., SWIATEK P.J., GENDRON-MAGUIRE M. , 

♦GREENSPAN R.J., MCMAHON A. P., GRIDLEY T,; 
"Expression pattern of Motch, a mouse homolog of Drosophila Notch, 
suggests an important role in early postimplantation mouse 
development."; 

RL DEVELOPMENT 115:737-744(1992). 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!■ SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS, 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS . 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch), 

CC 

DR EMBL; Z11886; G288503; -. 

DR MGD; MGI:97363; NOTCHl. 

DR PROSITE; PS00010; ASXJYDROXYL; 22, 

DR PROSITE; PS00022; EGF.l; 34. 

DR PROSITE; PS01186; EGF.2; 27. 





PROSITE; PS01187; EGF_CA; 


21. 


DR 


PFAM; PFO 


008; E 


GF; 35. 






PFAM; PF00023; ank; 6. 




nn 


PFAM; PF00066; notch; 3. 




DR 


HSSP; P00740; 1IXA. 






DIFFERENTIATION, 


NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


TO 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


PT 


SIGNAL 


1 


18 


POTENTIAL. 




CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 


FT 


DOMAIN 


19 


1725 


EXTRACELLULAR (POTENTIAL). 


y-p 


IRANbMEM 


1726 


1746 


POTENTIAL, 


J 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


J 


DOMAIN 


24 


1425 


36 X EGF-TYPE REPEATS. 




DOMAIN 


1449 


1462 


CYS-RICH. 




DOMAIN 


1445 


1562 


3 X LIN/NOTCH REPEATS. 




REPEAT 


1445 


1480 


LIN/NOTCH 1. 


J 


REPEAT 


1481 


1522 


LIN/NOTCH 2. 


FT 


REPEAT 


1523 


1562 


LIN/NOTCH 3. 


FT 


DOMAIN 


1865 


2075 


6 X ANK MOTIF REPEATS. 




REPEAT 


1865 


1910 


ANK MOTIF 1. 


FT 


REPEAT 


1912 


1942 


ANK MOTIF 2. 




REPEAT 


1944 


1975 


ANK MOTIF 3. 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4. 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


FT 


REPEAT 


2044 


2075 


ANK MOTIF 6. 


FT 


CARBOHYD 


888 


888 


POTENTIAL. 


FT 


CARBOHYD 


959 


959 


POTENTIAL. 


FT 


CARBOHYD 


1179 


1179 


POTENTIAL. 


FT 


CARBOHYD 


1241 


1241 


POTENTIAL. 


FT 


CARBOHYD 


1489 


1489 


POTENTIAL. 


FT 


CARBOHYD 


1587 


1587 


POTENTIAL. 


so 


SEQUENCE 


2531 


AA; 271312 MW; AD71189B CRC32; 


Query Match 




6,2% 


Score 704; DB 1; Length 2531; 



Best Local Similarity 28.5*; Pred. No. 1.14e-120; 
Matches 158; Conservative 118; Mismatches 230; Indels 48; Gaps 38; 

Db 420 ANRCEHAGKCL-NTLGSFECQCLQGYTGPGCEIDVNECISNPCQNDATC-L-D-QIGEFQ 475 

:| I : I I : : : I I I: I |:: :: MINI : MM : : I 
Qy 920 SNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFW 979 

Db 476 CICMPGYEGVYCEINTDECASSPCLHNGHCMDKIHEFQCQCPKGFNGHLCQYDVDECA-S 534 

III Ml Ihl 1:1 : I :|: M |::: I II : I II: :| II 
Qy 980 CICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 535 -TPCKNGAKCLDGPNTYTCVCTEGYTGTHCEVDIDECDPDPCHYGS-CKDGVATFTCLCQ 592 

II : :lh h : I II II I ll::|:|:|: : I h I hi :||:| 
Qy 1040 LNPCQHDSKC I LT PKGFKCDCTPGYVGEHCDI DFDDCQDNKC KNG AHCTDAVNG YTC ICP 1099 

Db 593 PGYTGHBCB-TN--I — NECHSQPCRHGGTCQDRDNSYLCLCLKGTTGPNCEINLDDC 645 

l!:| II : : : I : l::|: I II :| II I Ml :| 
Qy 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE-KL--V 1156 

Db 646 ASNPCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPCHNGGTCEDGIAGFTCRCP 705 

: I : : h I : : I : : II :| :| :: : I 
Qy 1157 SVNFINKESYLQ-IPSAKVRPQTNIT--L-QIATDEDSGILLYKGDKDHIAVELYRGRVR 1212 

Db 706 EGYHDPTCLSEVNECNSNPCIHGACRD-GLNGYKCDCAPGWSGTNCDINNNEC-ESNPCV 763 

:| I : : : I : I : : : I I I I :| 
Qy 1213 ASY-DTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSTLNF 1271 

Db 764 NGGTCKD-MTS-GYVCTCREGFSGPNCQTNINECASNPCLNQGTCIDDVAGYKCNCPLPY 821 

:: I : : I : |:: :| I |::: I I :| : ::| : : |: 
Qy 1272 DSPLYVGGMPGKSNVASLRQA-PGQN-GTSFHGCIRNLYIN-SE-LQD--FQ-KVPMQ- 1322 

Db 822 TGATCEWLAPCATSPCKNSGVCKESEDYESFSCVCPTGWQGQTCEVDINE-CVKSPCRH 880 

II II I : I I I : :|:l Mill: |: |: : I I 

Qy 1323 TGILPGC--EPCHKKVCAH-GTCQPSSQ-AGFTCECQEGWMGPLCDQRTNDPCLGNKCVH 1378 

Db 881 GASCQNTNG-SYRCLCQAGYTGRNCESDID-D-CRPNPCHNGGSCT-DGINTAFCDCLP 935 

I: I I: II I I II I: : I : I:: I :| I h ::|:| : 
Qy 1379 GT-CLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHG-KCRLSGLGQPYCECSS 1436 
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Db 936 GFQGAFCEEDINEC 949 

I: I h :|: I 
Qy 1437 GYTGDSCDREIS-C 1449 



ID 
AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 
OC 

4 



NOTC_BRARE STANDARD; PRT; 2437 AA, 
P4653Q; 

01-NOV-1995 (REL, 32, CREATED) 
01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
NEUROGENIC LOCOS NOTCH HOMOLOG PROTEIN PRECURSOR. 
NOTCH. 

BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO), 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACT I NOPT ERYG 1 1 ; NEOPTERYGII; 
TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 
CYPRINIDAE; RASBORINAE; DANIO, 
ID 

SEQUENCE FROM N.A. 

TISSUE-EMBRYO; 

MEDLINE; 94128602. 

BIERKAMP C, CAMPOS-ORTEGA J. A.; ' 

"A zebrafish homologue of the Drosophila neurogenic gene Notch and 
its pattern of transcription during early embryogenesis."; 
MECH. DEV. 43:87-100(1993). 

*!- FUNCTION: IMPLICATED IN CELL FATE SPECIFICATIONS DURING 
EMBRYO DEVELOPMENT . MAY BE INVOLVED IN THE FORMATION OF THE 
NEURAL PLATE, NOTOCHORD AND BRAIN VESICLES. 

■!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

-!- DEVELOPMENTAL STAGE: EXPRESSED IN ALL CELLS IN PREGASTRULATION 
STAGES. DURING GASTRULATION IS DIFFERENTIALLY EXPRESSED, 
ACCUMULATING PREDOMINANTLY IN THE PRECHORDAL MESODERM AND 
NOTOCHORD. AT THE END OF GASTRULATION, EXPRESSED ALONG THE 
ANTERIOR-POSTERIOR AXIS INCLUDING THE DEVELOPING NEURAL PLATE 
AND DIFFERENTIATING MESODERM, ALSO PRESENT IN THE DEVELOPING 
BRAIN AND HEAD REGIONS. 

-!• SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

-!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

•!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

-!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license<3isb-sib,ch). 

EMBL; X69088; G433867; -. 
PROSITE; PS00010; ASXJYDROXYL; 23. 
PROSITE; PS00022; EGF_1; 34, 
PROSITE; PS01186; EGF 2; 28. 
PROSITE; PS01187; EGF.CA; 22. 
PFAM; PF00008; EGF; 36, 
PFAM; PF00023; ank; 6. 
PFAM; PF00066; notch; 3. 
HSSP; P00740; 1IXA. 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


20 


POTENTIAL. 


FT 


CHAIN 


21 


2437 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN, 


FT 


DOMAIN 


21 


1724 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1725 


1747 


POTENTIAL, 


FT 


DOMAIN 


1748 


2437 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


21 


57 


EGF-LIKE 1. 


FT 


DOMAIN 


58 


98 


EGF-LIKE 2. 


FT 


DOMAIN 


101 


138 


EGF-LIKE 3. 


FT 


DOMAIN 


139 


175 


EGF-LIKE 4. 


FT 


DOMAIN 


177 


215 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


217 


254 


EGF-LIKE 6. 


FT 


DOMAIN 


256 


292 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 


FT, 


DOMAIN 


294 


332 


EGF-LIKE 8, CALCIUM- BINDING (POTENTIAL), 



FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


371 


409 


EGF-LIKE 10. 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


489 


524 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


526 


562 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


564 


599 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


601 


637 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


639 


674 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


676 


712 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


714 


749 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


751 


787 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


789 


825 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


827 


865 


EGF-LIKE 22. 


FT 


DOMAIN 


867 


903 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


905 


941 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


943 


979 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


981 


1017 


EGF-LIKE 26. 


FT 


DOMAIN 


1019 


1055 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1057 


1093 


EGF-LIKE 28. 


FT 


DOMAIN 


1095 


1141 


EGF-LIKE 29. 


FT 


DOMAIN 


1143 


1179 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1181 


1217 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1219 


1263 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1265 


1303 


EGF-LIKE 33. 


FT 


DOMAIN 


1305 


1344 


EGF-LIKE 34. 


FT 


DOMAIN 


1346 


1382 


EGF-LIKE 35. 


FT 


DOMAIN 


1385 


1423 


EGF-LIKE 36. 


FT 


DOMAIN 


1446 


1561 


3 X LIN/NOTCH REPEATS, 


FT 


REPEAT 


1446 


1486 


LIN/NOTCH 1. 


FT 


REPEAT 


1487 


1520 


LIN/NOTCH 2. 


FT 


REPEAT 


1521 


1561 


LIN/NOTCH 3. 


FT 


DOMAIN 


1861 


2074 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1861 


1891 


ANK MOTIF 1. 


FT 


REPEAT 


1892 


1940 


ANK MOTIF 1. 


FT 


REPEAT 


1941 


1974 


ANK MOTIF 1. 


FT 


REPEAT 


1975 


2007 


ANK MOTIF 1. 


FT 


REPEAT 


2008 


2040 


ANK MOTIF 1. 


FT 


REPEAT 


2041 


2074 


ANK MOTIF 1, 


FT 


DOMAIN 


2265 


2276 


POLY-GLN (OPA-REPEAT) . 


FT 


DISULFID 


25 


35 


BY SIMILARITY. 


FT 


DISULFID 


29 


45 


BY SIMILARITY. 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 


FT 


DISULFID 


62 


73 


BY SIMILARITY. 


FT 


DISULFID 


67 


86 


BY SIMILARITY. 


FT 


DISULFID 


88 


97 


BY SIMILARITY. 


FT 


DISULFID 


105 


116 


BY SIMILARITY. 


FT 


DISULFID 


110 


126 


BY SIMILARITY. 


FT 


DISULFID 


128 


137 


BY SIMILARITY, 


FT 


DISULFID 


143 


154 


BY SIMILARITY. 


FT 


DISULFID 


148 


163 


BY SIMILARITY, 


FT 


DISULFID 


165 


174 


BY SIMILARITY. 


FT 


DISULFID 


181 


194 


BY SIMILARITY. 


FT 


DISULFID 


188 


203 


BY 'SIMILARITY, 


FT 


DISULFID 


205 


214 


BY SIMILARITY. 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 


FT 


DISULFID 


226 


242 


BY SIMILARITY. 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT 


DISULFID 


260 


271 


BY SIMILARITY, 


FT 


DISULFID 


265 


280 


BY SIMILARITY. 


FT 


DISULFID 


282 


291 


BY SIMILARITY. 


FT 


DISULFID 


298 


311 


BY SIMILARITY. 


FT 


DISULFID 


305 


320 


BY SIMILARITY. 


FT 


DISULFID 


322 


331 


BY SIMILARITY. 


FT 


DISULFID 


338 


349 


BY SIMILARITY. 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 


FT 


DISULFID 


360 


369 


BY SIMILARITY, 


FT 


DISULFID 


375 


386 


BY SIMILARITY. 


FT 


DISULFID 


380 


397 


BY SIMILARITY. 


FT 


DISULFID 


399 


408 


BY SIMILARITY, 


FT 


DISULFID 


415 


428 


BY SIMILARITY, 


FT 


DISULFID 


422 


437 


BY SIMILARITY. 


FT 


DISULFID 


439 


448 


BY SIMILARITY. 
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nTcny pin 
UioULr ID 




466 


BY SIMILARITY. 


FT 


UlbULf ID 


460 


475 


BY SIMILARITY. 


FT 


DISULFID 


477 


486 


BY SIMILARITY. 


FT 


UioULr ID 


493 


503 


BY SIMILARITY. 


FT 


DISULFID 


498 


512 


BY SIMILARITY, 




UioULr ID 






BY SIMILARITY 


PT 


DIoULr ID 


con 

530 


541 


BY SIMILARITY 


FT 


DISULFID 


535 


550 


BY SIMILARITY 


FT 


DISULFID 


552 


561 


BY SIMILARITY 


FT 


DISULFID 


568 


578 


BY SIMILARITY 


FT 


DISULFID 


fin 


587 


BY SIMILARITY 


FT 


DISULFID 


589 


598 


BY SIMILARITY 


FT 


DISULFID 


605 


616 


BY SIMILARITY 




DISULFID 


610 


625 


BY SIMILARITY 


PT 

FT 


UioULr ID 


627 


636 


BY SIMILARITY 


FT 


DISULFID 


643 


653 


BY SIMILARITY 


FT 


DISULFID 


648 


662 


BY SIMILARITY 


FT 


DISULFID 


664 


673 


BY SIMILARITY 


FT 


DISULFID 


680 


691 


BY SIMILARITY 


FT 


DISULFID 


685 


700 


BY SIMILARITY 


ft 


DISULFID 


702 


711 


BY SIMILARITY 


P 


DISULFID 


718 


728 


BY SIMILARITY 




DISULFID 


723 


737 


BY SIMILARITY 


FT 


DISULFID 


739 


748 


BY SIMILARITY 




DISULFID 


755 


766 


BY SIMILARITY 


PT 


UioULr ID 






BY SIMILARITY 


PT 


nTcnt pth 
UioULr iU 


777 


7RR 


DV OTMTT RDTTV 

Dl oIMlLAKlli 




HTCriT PTH 


793 




DV OTMTT RDTTV 

Dl OIMlLAKlli 




nTcni.PTn 
uiouiir iu 


798 


813 


DV CTMTT RDTTV 
SI DiMlLAKllI 




uioULr iu 


815 


824 


DV CTMTT RDTTV 

Dl OIMlLAKlli 




nTCru pth 

UloULI iu 


831 




DV CTMTT RDTTV 
Dl OIMlLAKlli 


FT 


UlOULf IU 


836 


853 


DV CTMTT RDTTV 
Dl OiWlLAKllI 




UlOULr IU 


855 




DV CTMTT RDTTV 

OI OIMlLAKlli 


FT 


nTcnrPTn 

LflOULr iu 


871 


882 


QV OTMTT RDTTV 
Dl S1M1LAK11I 




UioULr IU 


876 


891 


DV CTMTT RDTTV 
Dl blMiLAKlll 




UioULr iU 






DV CTMTT RDTTV 

di oIMlLAKlli 


FT 


uioULr iu 


qnq 


q?n 


DV OTMTT RDTTV 

DI oIMlLAKlli 


J 


HTCrTT pin 

UioULr IU 


914 


929 


DV CTMTT RDTTV 

BY SIMILARITY 




UioULr IU 


931 


940 


DV CTMTT RDTTV 

di MMlbAKI 11 


PT 


DISULFID 


947 


958 


BY SIMILARITY 


FT 


UioULr ID 


952 


967 


BY SIMILARITY 


FT 


DISULFID 


969 


978 


BY SIMILARITY 


FT 


DISULFID 


1023 


1034 


BY SIMILARITY 


FT 


DISULFID 


1028 


1043 


BY SIMILARITY 


FT 


DISULFID 






BY SIMILARITY 


FT 


UioULr ID 


1061 


1072 


BY SIMILARITY 


FT 


DISULFID 


1066 


1081 


BY SIMILARITY 


FT 


DISULFID 


1083 


1092 


BY SIMILARITY 


& 


DISULFID 






BY SIMILARITY 


■ 


UioULr IU 


1114 


1129 


DV CTMTT RDTTV 

BY SIMILARITY 




DISULFID 


1131 


1140 


BY SIMILARITY 


FT 


DISULFID 


1147 


1158 


BY SIMILARITY 


FT 


DISULFID 


1152 


1167 


BY SIMILARITY 


FT 


DISULFID 


1169 


1178 


BY SIMILARITY 


FT 


DISULFID 


1185 


1196 


BY SIMILARITY 


FT 


DISULFID 


1190 


1205 


BY SIMILARITY 


FT 


DISULFID 


1207 


1216 


BY SIMILARITY 


FT 


DISULFID 


1223 


1242 


BY SIMILARITY 


FT 


DISULFID 


1236 


1251 


BY SIMILARITY 


FT 


DISULFID 


1253 


1262 


BY SIMILARITY 



Note: remainder of annotations omitted. 



Query Match 5.9%; Score 667; DB 1; Length 2437; 

Best Local Similarity 39.3%; Pred. No. 2.16e-112; 

Matches 94; Conservative 53; Mismatches 79; Indels 13; Gaps 9; 

Db 791 NECASNPCLNQGSCIDDVAGF-KCNCMLPYTGEVCENVLAPCSPRPCKNGGVC--RESED 847 

! I IMI 1:1:1 I I :| I : |: h : :| : llhll I :|:|: 
Qy 916 NPCLS NPC KNDGTC NSDPVDFYRCTC P YGFKGQDCDVP IHACISNPCKHGGTCHLKEGEE 975 

Db 848 FQSFSCNCPAGWQGQTCEVDINECVRNPCTNGGVCENLRGGFQCRCNPGFTGALCENDID 907 

::| I I: I :h lll::::| I I I : I : : I I I :|| :| 



Qy 976 -DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLD 1034 

Db 908 DC - -EPNPCSNGGVCQDRVNGFVCVCLAGFRGERCAEDIDECVSAPCRNGGNCTDCVNSY 965 

I : III : : I HI I I :|: IN hhl |:||::||| Ihl 
Qy 1035 FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 

Db 966 TCSCPAGFSGINCEINTPDC-TESS-C--FN--GGT-CVDGISSFSCVCLPGFTGNYCQ 1017 

II II Ml: II:: I :| II: I: |: I: ' I I I: I: |: 
Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 



RESULT 10 

ID NTC1 HUMAN STANDARD; PRT; 2444 AA. 

AC P46531; 

DT 01-NOV-1995 (REL, 32, CREATED) ■ 

DT 01-HOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1 PRECURSOR ( TRANSLOCATION - 

DE ASSOCIATED NOTCH PROTEIN TAN-1) (FRAGMENT). 

GN N0TCH1 OR TAN1. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91347367. 

RA ELLISEN L.W., BIRD J,, WEST D.C., SORENG A.L., REYNOLDS T.C., 

RA SMITH S.D., SKLAR J.; 

RT "TAN-1, the human homolog of the Drosophila notch gene, is broken by 

RT chromosomal translocations in T lymphoblastic neoplasms . " ; 

RL CELL 66:649-661(1991). 

CC -!- FUNCTION: MAY BE IMPORTANT FOR NORMAL LYMPHOCYTE FUNCTION. IN 

CC ALTERED FORM, MAY CONTRIBUTE TO TRANSFORMATION OR PROGRESSION 

CC IN SOME T-CELL NEOPLASMS. 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC •!- TISSUE SPECIFICITY: IN FETAL TISSUES MOST ABUNDANT IN SPLEEN, 

CC BRAIN STEM AND LUNG. ALSO PRESENT IN MOST ADULT TISSUES WHERE IT 

CC IS FOUND MAINLY IN LYMPHOID TISSUES. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC -!- SIMILARITY; CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!■ SIMILARITY; CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch). 

CC 

DR EMBL; M73980; G338675; -, 

DR MLM; 190198; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 20. 

DR PROSITE; PS00022; EGF_1 ; 34. 

DR PROSITE; PS01186; EGFJ; 26, 

DR PROSITE; PS01187; EGF.CA; 18. 

DR PFAM; PF00008; EGF; 35, 

DR PFAM; PF00023; ank; 6, 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 



FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT' 


CHAIN 


19 


>2444 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1. 


FT 


DOMAIN 


19 


1736 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


1737 


1757 


POTENTIAL, 


FT 


DOMAIN 


1758 


>2444 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3. 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL) , 
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FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20. 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) 


IS 


DOMAIN 


829 


868 


EGF-LIKE 22. 


i 


DOMAIN 


870 


906 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL) 


■ 


DOMAIN 


908 


944 ' 


EGF-LIKE 24. 


TT 


DOMAIN 


946 


982 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 26. 


FT 


DOMAIN 


1022 


1058 


EGF-LIKE 27. 


FT 


DOMAIN 


1060 


1096 


EGF-LIKE 28. 


FT 


DOMAIN 


1098 


1144 


EGF-LIKE 29. 


FT 


DOMAIN 


1146 


1182 


EGF-LIKE 30. 


FT 


DOMAIN 


1184 


1220 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1222 


1266 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1268 


1306 


EGF-LIKE 33. 


FT 


DOMAIN 


1308 


1347 


EGF-LIKE 34, 


FT 


DOMAIN 


1349 


1385 


EGF-LIKE 35. 


FT 


DOMAIN 


1388 


1427 


EGF-LIKE 36. 


FT 


DOMAIN 


1446 


1563 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1446 


1481 


LIN/NOTCH 1. 


FT 


REPEAT 


1482 


1523 


LIN/NOTCH 2. 


FT 


REPEAT 


1524 


1563 


LIN/NOTCH 3. 


FT 


DOMAIN 


1876 


2087 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1876 


1921 


ANK MOTIF 1. 


FT 


REPEAT 


1923 


1954 


ANK MOTIF 2. 


FT 


REPEAT 


1956 


1987 


ANK MOTIF 3. 


FT 


REPEAT 


1990 


2021 


ANK MOTIF 4. 


FT 


REPEAT 


2023 


2054 


ANK MOTIF 5. 


FT 


REPEAT 


2056 


2087 


ANK MOTIF 6. 


FT 


DOMAIN 


1576 


1579 


POLY-VAL. 


FT 


DOMAIN 


1662 


1665 


POLY-ARG. 


FT 


DOMAIN 


1729 


1732 


POLY -PRO, 




DOMAIN 


1741 


1744 


POLY-ALA, 


■ 


DOMAIN 


1902 


1905 


POLY-GLU. 


if 


DOMAIN 


2260 


2263 


POLY-GLY. 


FT 


DOMAIN 


2404 


2407 


POLY-GLN, 


FT 


DOMAIN 


2411 


2418 


POLY -PRO. 


FT 


DISOLFID 


24 


37 


BY SIMILARITY. 


FT 


DISOLFID 


31 


46 


BY SIMILARITY. 


FT 


DISOLFID 


48 


57 


BY SIMILARITY. 


FT 


DISULFID 


63 


74 


BY SIMILARITY. 


FT 


DISOLFID 


68 


87 


BY SIMILARITY. 


FT 


DISOLFID 


89 


98 


BY SIMILARITY. 


FT 


DISOLFID 


106 


117 


BY SIMILARITY. 


FT 


DISOLFID 


111 


127 


BY SIMILARITY. 


FT 


DISOLFID 


129 


138 


BY SIMILARITY. 


FT 


DISOLFID 


144 


155 


BY SIMILARITY. 


FT 


DISOLFID 


149 


164 


BY SIMILARITY. 


FT 


DISOLFID 


166 


175 


BY SIMILARITY. 


FT 


DISOLFID 


182 


195 


BY SIMILARITY. 


FT 


DISOLFID 


189 


204 


BY SIMILARITY. 


FT 


DISOLFID 


206 


215 


BY SIMILARITY. 


FT 


DISOLFID 


222 


233 


BY SIMILARITY, 


FT 


DISOLFID 


227 


243 


BY SIMILARITY. 


FT 


DISOLFID 


245 


254 


BY SIMILARITY. 


FT 


DISOLFID 


261 


272 


BY SIMILARITY, 


FT 


DISOLFID 


266 


■281 


BY SIMILARITY, 


FT 


DISOLFID 


283 


292 


BY SIMILARITY. 


FT 


DISOLFID 


299 


312 


BY SIMILARITY, 


FT 


DISOLFID 


306 


321 


BY SIMILARITY. 



FT 


DISULFID 


323 


332 


BY SIMILARITY. 


FT 


DISULFID 


339 


350 


BY SIMILARITY. 


FT 


DISULFID 


344 


359 


BY SIMILARITY, 


FT 


DISULFID 


361 


370 


BY SIMILARITY. 


FT 


DISULFID 


376 


387 


BY SIMILARITY. 


FT 


DISULFID 


381 


398 


BY SIMILARITY. 


FT 


DISULFID 


400 


409 


BY SIMILARITY. 


FT 


DISULFID 


416 


429 


BY SIMILARITY. 


FT 


DISULFID 


423 


438 


BY SIMILARITY. 


FT 


DISULFID 


440 


449 


BY SIMILARITY. 


FT 


DISULFID 


456 


467 


BY SIMILARITY, 


FT 


DISULFID 


461 


476 


BY SIMILARITY. 


FT 


DISULFID 


478 


487 


BY SIMILARITY, 


FT 


DISULFID 


494 


505 


BY SIMILARITY, 


FT 


DISULFID 


499 


514 


BY SIMILARITY, 


FT 


DISULFID 


516 


525 


BY SIMILARITY, 


FT 


DISULFID 


532 


543 


BY SIMILARITY, 


FT 


DISULFID 


537 


552 


BY SIMILARITY. 


FT 


DISULFID 


554 


563 


BY SIMILARITY. 


FT 


DISULFID 


570 


580 


BY SIMILARITY. 


FT 


DISULFID 


575 


589 


BY SIMILARITY, 


FT' 


DISULFID 


591 


600 


BY SIMILARITY. 


FT 


DISULFID 


607 


618 


BY SIMILARITY. 


FT 


DISULFID 


612 


627 


BY SIMILARITY, 


FT 


DISULFID 


629 


638 


BY SIMILARITY. 


FT 


DISULFID 


645 


655 


BY SIMILARITY. 


FT 


DISULFID 


650 


664 


BY SIMILARITY, 


FT 


DISULFID 


666 


675 


BY SIMILARITY. 


FT 


DISULFID 


682 


693 


BY SIMILARITY. 


FT 


DISULFID 


687 


702 


BY SIMILARITY, 


FT 


DISULFID 


704 


713 


BY SIMILARITY, 


FT 


DISULFID 


720 


730 


BY SIMILARITY. 


FT 


DISULFID 


725 


739 


BY SIMILARITY. 


FT 


DISULFID 


741 


750 


BY SIMILARITY. 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


FT 


DISULFID 


762 


777 


BY SIMILARITY. 


FT 


DISULFID 


779 


788 


BY SIMILARITY. 


FT 


DISULFID 


795 


806 


BY SIMILARITY, 


FT 


DISULFID 


800 


815 


BY SIMILARITY. 


FT 


DISOLFID 


817 


826 


BY SIMILARITY. 


FT 


DISOLFID 


833 


844 


BY SIMILARITY. 


FT 


DISULFID 


838 


855 


BY SIMILARITY. 


FT 


DISULFID 


857 


867 


BY SIMILARITY, 


FT 


DISULFID 


874 


885 


BY SIMILARITY. 


FT 


DISULFID 


879 


894 


BY SIMILARITY. 


FT 


DISULFID 


896 


905 


BY SIMILARITY, 


FT 


DISULFID 


912 


923 


BY SIMILARITY. 


FT 


DISULFID 


917 


932 


BY SIMILARITY, 


FT 


DISULFID 


934 


943 


BY SIMILARITY, 


FT 


DISULFID 


988 


999 


BY SIMILARITY. 


FT 


DISULFID 


993 


1008 


BY SIMILARITY, 


FT 


DISULFID 


1010 


1019 


BY SIMILARITY. 


FT 


DISULFID 


1026 


1037 


BY SIMILARITY, 


FT 


DISULFID 


1031 


1046 


BY SIMILARITY, 


FT 


DISULFID 


1048 


1057 


BY SIMILARITY. 


FT 


DISULFID 


1064 


1075 


BY SIMILARITY. 


FT 


DISULFID 


1069 


1084 


BY SIMILARITY. 


FT 


DISULFID 


1086 


1095 


BY SIMILARITY. 


FT 


DISULFID 


1102 


1123 


BY SIMILARITY. 


FT 


DISULFID 


1117 


1132 


BY SIMILARITY. 


FT 


DISULFID 


1134 


1143 


BY SIMILARITY, 


FT 


DISULFID 


1150 


1161 


BY SIMILARITY, 


FT 


DISULFID 


1155 


1170 


BY SIMILARITY, 


FT 


DISULFID 


1172 


1181 


BY SIMILARITY. 


FT 


DISULFID 


1188 


1199 


BY SIMILARITY, 


FT 


DISULFID 


1193 


1208 


BY SIMILARITY. 



Note: remainder of annotations omitted. 



Query Match 5.9%; Score 667; DB 1; Length 2444; 

Best Local Similarity 30.2%; Pred. No. 2,16e-112; 

Matches 167; Conservative 115; Mismatches 220; Indels 51; Gaps 37; 
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Db 


832 PCAPSPCRNGGECRQSEDYESFSCVCPTA.GAKGQTCEVDINECVLSPCRHGASCQNTHGX 891 


■ DR 


EMBL; AF003522; G2197069; -. 






II ::lhl II h : : 1 II 1 III |:| |: h :||:||::|: 1 


DR 


PROSITE; 


PS00010; ASXJYDROXYL; 3. 


Qy 


917 PCLS NPCKNDGTC ■ NSDPVDFYRCTC PY • GFKGQDCDVPIHACISNPCKHGGTCHLKEGE 974 


DR 


PROSITE; PS00022; EGF_1; 8. 








DR 


PROSITE; 


PS01186; EGFJ; 8. 




Db 


892 - --YRCHCQAGYSGRNCETDIDDCRPNPCHNGGSCTDGINTAFCDCLPGFRGTFCEEDIN 948 


DR 


PROSITE; PS01187; EGF_CA; 1. 






1 |: I III ::||| I | | ::| MM 1 1 1 : 1 MM 


DR 


PFAM; PFO 


0008; EGF; 6, 




Qy 


975 EDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLD 1034 


DR 


HSSP; P00740; 1IXA. 








KW 


oiuML; ] 


GF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE. 


Db 


949 ECASD--PCRNGANCTDCVDSYICTCPAGFSGIHCENHTPDCTESSCFNGGTCVDGINSF 1006 . 


FT 


SIGNAL 


1 


17 


POTENTIAL. 




II 1 lh: ::| :: 1 1 :|: 1 M: : || :: I ||: 1 |::|:: 


FT 


CHAIN 


18 


723 


fM?r<TI]\-T Tift" nDATETU 1 

UhLlA'ljlKL PKUlhlN J., 


Qy 


1035 FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL). 






FT 


TRANSMEM 


546 


568 


POTENTIAL, 


Db 


1007 TCLCPPGFTGSYCQ- - -HW - - - ■ NECDSRPCLLGGTCQDGRGLHRCTCPQGYTGPNCQN 1059 


FT 


DOMAIN 


569 


723 


CYTOPLASMIC (POTENTIAL). 




MM MM M: M : M: 1 M 1 1 1 II 1 :|:: 


FT 


DOMAIN 


226 


254 


EGF-LIKE 1. 


Qy 


1095 TCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEK 1154 


FT 


DOMAIN 


257 


285 


EGF-LIKE 2. 






FT 


DOMAIN 


292 


325 


EGF-LIKE 3. 


Db 


1060 LV-HWCDSSPC-K-NGGKCW-QTH-T-QYRCECPSGWTGLY-CDVPSVSCEV-AAQ-RQ 1109 


FT 


DOMAIN 


332 


363 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL) 




II : : : ::| : ||: I I : || || I :: |: :: | 


FT 


DOMAIN 


370 


402 


EGF-LIKE 5. 


Qy 


1155 LVSVNFINKESYLQIPSAKVRPQTNITLQIATDEDSG-ILLYKGDKDHIAVELYRGRVRA 1213 


FT 


DOMAIN 


409 


440 


EGF-LIKE 6. 






FT 


DOMAIN 


447 


478 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL) 




1110 GVDVARLCQHGGLCVDAGNTHHCRCQAGYTGSYCEDL-VDECSPSPCQNGATCTDYLGGY 1168 


FT 


DOMAIN 


485 


516 


EGF-LIKE 8, 


1 


: 1 : : !:: | : : : MM 1 : 1 


FT 


DISULFID 


226 


237 


BY SIMILARITY. 


w 


1214 SYDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSK-QSTLNFD 1272 


FT 


DISULFID 


230 


243 


BY SIMILARITY, 






FT 


DISULFID 


245 


254 


BY SIMILARITY, 


Db 


1169 SCKCVAGYHGVNCSEEIDECLSH-PCQNGGTCLD-LPNTYKCSCPRGTOGVHCEINVDDC 1226 


FT 


DISULFID 


257 


268 


BY SIMILARITY. 




1 |:| 1 : I: : 1 : 1 III : : 1 1 1 : 1 1 : : 


FT 


DISULFID 


263 


274 


BY SIMILARITY, 


Qy 


1273 SPLYVGGMPG - K - S N - VAS - LRQ APGQNGTS FHGC IRNLY INSELQDFQKV PMQTGI — 1325 


FT 


DISULFID 


276 


285 


BY SIMILARITY. 






FT 


DISULFID 


292 


304 


BY SIMILARITY. 


Db 


1227 NPPVDPVSRSPKCFNNGTC-VDQVGGYSCTCPPGFVGERCEGDVNE-CLSNPCDARGTQN 1284 


FT, 


DISULFID 




314 


BY SIMILARITY. 




1 M : 1 MM :|::| 1 M M |: ||:| I M 


FT 


rucnr pth 
UlbUbrlD 


316 


325 


BY SIMILARITY. 


Qy 


1326 LPGCEPCHKKV-C-AHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKC-VHGT- 1380 


FT 


DISULFID 


332 


343 


BY SIMILARITY. 






FT 


DISULFID 


337 


352 


BY SIMILARITY. 


Db 


1285 CVQRVN-DFHCECRAGHTGRRC- --ESVINGCKGKPCKNGGTCAVASNTARGFICKCPAG 1340 


FT 


DISULFID 


354 


363 


BY SIMILARITY , 




1: : 1 1 II 1 1 1 ::| 1 : l|:| 1 : 1 :: : I MM 


FT 


DISULFID 


370 


381 


BY SIMILARITY. 


Qy 


1381 CLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHG-KCRL-SGLGQPY-CECSSG 1437 


FT 


DISULFID 


375 


391 


BY SIMILARITY. 






FT 


DISULFID 


393 


402 


BY SIMILARITY. 


Db 


1341 FEGATCENDARTC 1353 


FT 


DISULFID 


409 


420 


BY SIMILARITY. 




: 1 :|: : ■ :| 


FT 


DISULFID 


414 


429 


BY SIMILARITY. 


Qy 


1438 YTGDSCDREI-SC 1449 


FT 


DISULFID 


431 


440 


BY SIMILARITY. 






FT 


DISULFID 


447 


467 


BY SIMILARITY. 






FT 


DISULFID 


469 


478 


BY SIMILARITY, 


RESULT 11 


FT 


DISULFID 


485 


496 


BY SIMILARITY, 


ID 


DLL1JUMAN STANDARD; PRT; 723 AA. 


FT 


DISULFID 


490 


505 


BY SIMILARITY, 


AC 


000548; 


FT 


DISULFID 


507 


516 


BY SIMILARITY. 


DT 


15-JOL-1998 (REL, 36, CREATED) 


FT 


CARBOHYD 


477 


477 


POTENTIAL. 


DT 


15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 


SQ 


SEQUENCE 


723 AA; 


77956 MW; A1D48BDB CRC32; 



DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 
DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTAl), 

»DLL1. 
HOMO SAPIENS (HUMAN). 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 
RN [1] ' 

RP SEQUENCE FROM N.A. 

RA MANN R.S., GRAY G.E., HENRIQUE D., ISH-HOROWICZ D,, 
RA ARTAVANIS-TSAKONAS S.; 

RL SUBMITTED (MAY-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

!• FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 
MAMMALIAN EMBRYOS, MAY HAVE A ROLE IN CELLULAR INTERACTIONS 
UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 
SIMILARITY) . 

!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
!• SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS, 
I- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 



CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 
CC the European Bioinformatics Institute, There are no restrictions on its 
CC use by non-profit institutions as long as its content is in no way 
CC modified and this statement is not removed. Usage by and for commercial 
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
CC or send an email to licenseGisb-sib.ch). 

CC 



Query Match 5.7%; Score 646; DB 1; Length 723; 

Best Local Similarity 38.2%; Pred. No. 1.03e-107; 

Matches 91; Conservative 53; Mismatches 81; Indels 13; Gaps 9; 

Db 296 KPCKNGATCTNTGQGSYTCSCRPGYTGATCELGIDECDPSPCKNGGSC--TD-LENSYSC 352 

:IMI :ll : I 1:1 h I |:: I I ::|||:||:| : |::: I 
Qy 921 NPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWC 980 

Db 353 TCPPGFYGKICELSAMTCADGPCFNGGRCSDSPDGGYSCRCPVGYSGFNCEKKIDYCS-S 411 

I: II I lh: II 11:11-:: MM M || MM: 
Qy 981 ICADGFEGENCEVNVDDCEDNDCENNSTCVDGINN-YTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 412 -SPCSNGAKCVDLGDAYLCRCQAGFSGRHCDDNVDDCASSPCANGGTCRDGVNDFSCTCP 470 

Ml : MM :: I I :|: I III : III : I M I |:|| ::| II 
Qy 1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

Db 471 PGYTGRNC-SAP-V-SR--CEHAPCHNGATCHERGHGYVCECARGYGGPNCQFLLP 521 

MM I |:| I M I:: |:||| I | : MM || I :|: |:: 
Qy 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKLVS 1157 



RESULT 12 

ID DLLlJAT STANDARD; PRT; 714 AA. 
AC P97677; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL, 35, LAST SEQUENCE UPDATE) 



Tue Jun 1 10:16:12 1999 



US-09-191-647-2.rsp 
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DT 


01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 


DE 


DELTA-LIKE PROTEIN 1 PREC 


URSOR (DELTA1). 


GN 


DLL1, 








OS 


RATTDS NORVEGICUS (RAT). 




OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 


OC 


RODENTIA 


SCIUROGNATHI; MURIDAE; MURINAE; RATTUS, 


RN 


[1] 








RP 


SEQUENCE 


FROM N.A. 






RA 


DISIBIO G., HEBSHI L. ( BOULTER J., WEINMASTER G.; 


RL 


SUBMITTED (DEC-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 


CC 


-!- FUNCTION: MAY 


BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 


CC 


MAMMALIAN EMBR 


YOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 


CC 


UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 


CC 


SIMILARITY) . 






CC 


-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 


CC 


•!- SIMILARITY: CONTAINS 


3 EGF-LIKE DOMAINS. 




•!- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 


■ 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


w 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation ■ 


CC 


the European Bioinformatics Institute. There are no restrictions on its 


CC 


use by 


non-profit institutions as long as its content is in no way 


CC 


modified and this statement is not removed. Usage by and for commercial 


CC 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


CC 
CC 
DR 


or send an email to license@isb-sib.ch). 


EMBL; U7 


889; G1699046; • 




DR 


PROSITE; PS0001Q; ASXJYDROXYL; 3. 


DR 


PROSITE; PS00022; 


EGF_1; 


3. 


DR 


PROSITE; PS01186; EGF J; 


3. 


DR 


PROSITE; PS01187; 


EGF.CA; 


2. 


DR 


PFAM; PFC 


0008; EGF; 6. 




DR 


HSSP; P00740; 1IXA. 




KW 


SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE. 


FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


714 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


537 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


538 


560 


POTENTIAL. 


FT 


DOMAIN 


561 


714 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


225 


253 


EGF-LIKE 1. 


FT 


DOMAIN 


256 


284 


EGF-LIKE 2. 


FT 


DOMAIN 


291 


324 


EGF-LIKE 3. 


FT 


DOMAIN 


331 


362 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL). 


FT 


DOMAIN 


369 


401 


EGF-LIKE 5. 


FT 


DOMAIN 


408 


439 


EGF-LIKE 6, 




DOMAIN 


446 


477 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 


■ 


DOMAIN 


484 


515 


EGF-LIKE 8. 


w 


DISULFID 


225 


236 


BY SIMILARITY, 


FT 


DISULFID 


229 


242 


BY SIMILARITY. 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT 


DISULFID 


256 


267 


BY SIMILARITY. 


FT 


DISULFID 


262 


273 


BY SIMILARITY. 


FT 


DISULFID 


275 


284 


BY SIMILARITY. 


FT 


DISULFID 


291 


303 


BY SIMILARITY. 


FT 


DISULFID 


297 


313 


BY SIMILARITY. 


FT 


DISULFID 


315 


324 


BY SIMILARITY. 


FT 


DISULFID 


331 


342 


BY SIMILARITY. 


FT 


DISULFID 


336 


351 


BY SIMILARITY. 


FT 


DISULFID 


353 


362 


BY SIMILARITY. 


FT 


DISULFID 


369 


380 


BY SIMILARITY. 


FT 


DISULFID 


374 


390 


BY SIMILARITY, 


FT 


DISULFID 


392 


401 


BY SIMILARITY, 


FT 


DISULFID 


408 


419 


BY SIMILARITY, 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


FT 


DISULFID 


430 


439 


BY SIMILARITY. 


FT 


DISULFID 


446 


466 


BY SIMILARITY. 


FT 


DISULFID 


468 


477 


BY SIMILARITY, 


FT 


DISULFID 


484 


495 


BY SIMILARITY. 


FT 


DISULFID 


489 


504 


BY SIMILARITY. 


FT 


DISULFID 


506 


515 


BY SIMILARITY. 


FT 


CARBOHYD 


476 


476 


POTENTIAL. 


so 


SEQUENCE 


714 AA; 


77378 MW; 604B76D1 CRC32; 



Query Match 5.6%; Score 631; DB 1; Length 714; 

Best Local Similarity 36.6%; Pred. No. 2.22e-104; 

Matches 87; Conservative 56; Mismatches 82; Indels 13; Gaps 9; 

Db 295 KPCRNGATCTNTGQGSYTCSCRPGYTGANCELEVDECAPSPCRNGGSC--TD-LEDSYSC 351 

:lhl :ll : I hi h I :|:: : I ::||::||:| : II:: i 
Qy 921 NPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWC 980 

Db 352 TCPPGFYGKVCELSAMTCADGPCFNGGRCSDNPDGGYTCHCPAGFSGFNCEKKIDLCS-S 410 

I: II I 11- II I I : II : III I!: ::| II |:|:|: 
Qy 981 ICADGFEGENCEVNVDDCEDNDCENNSTCVDGINN-YTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 411 -SPCSNGAKCVDLGNSYLCRCQTGFSGRYCEDNVDDCASSPCANGGTCRDSVNDFSCTCP 469 

:H : :lh ::: II hi h : III : I lh I hi! ::| I! 
Qy 1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

Db 470 PGYTGRNC--SAP-V-SR— CEHAPCHNGATCHQRGQRYMCECAQGYGGANCQFLLP 520 

lh! I hi I :| h: 1:111 I I :|:| II I :|: I" 
Qy 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKLVS 1157 



RESULT 13 

ID DLL1JIOUSE STANDARD; PRT; 722 AA. 

AC Q61483; 

DT 01-NOV-1997 (REL, 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTA1). 

GN DLLl, 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BALB/C X C57BL/6; TISSUE- EMBRYO; 

RX MEDLINE; 95401858. 

RA BETTENHAUSEN B., DE ANGELIS M.H, , SIMON D., GUENET J.-L,, GOSSLER A.; 

RT "Transient and restricted expression during mouse embryogenesis of 

RT Dill, a murine gene closely related to Drosophila Delta."; 

RL DEVELOPMENT 121:2407-2418(1995). 

CC -!- FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 

CC MAMMALIAN EMBRYOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 

CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM, 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- TISSUE SPECIFICITY: IN THE EMBRYO, EXPRESSED IN THE PARAXIAL 

CC MESODERM AND NERVOUS SYSTEM. EXPRESSED AT HIGH LEVELS IN ADULT 

CC HEART AND AT LOWER LEVELS, IN ADULT LUNG. 

CC -!- DEVELOPMENTAL STAGE: EXPRESSED UNTIL DAY 15 IN THE EMBRYO. 

CC EXPRESSION THEN DECREASES AND INCREASES AGAIN IN THE ADULT. 

CC -!- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 

CC -I- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 

CC 



CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between . the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 



EMBL; X80903; G806570; -. 
MGD; MGI: 104659; DLLl. 
PROSITE; PS00010; ASXJYDROXYL; 3. 
PROSITE; PS00022; EGF_1 ; 8, 
PROSITE; PS01186; EGF_2; 8. 
PROSITE; PS01187; EGF.CA; 2. 
PFAM; PF00008; EGF; 6. 
HSSP; P00740; 1IXA. 

SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE, 



SIGNAL 


1 


17 


POTENTIAL, 


CHAIN 


18 


722 


DELTA-LIKE PROTEIN 1. 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL) 


TRANSMEM 


546 


568 


POTENTIAL. 



Tue Jun 1 10:16:12 1999 



US-09-191-647-2.rsp 
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FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

• DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT CARBOHYD 

SQ SEQUENCE 



569 
225 
256 
291 
331 
369 
408 
446 
484 
225 
229 
244 
256 
262 
275 
291 
297 
315 
331 
336 
353 
369 
374 
392 
408 
413 
430 
446 
468 
484 
489 
506 
476 

722 AAi 



722 
253 
284 
324 
362 
401 
439 
477 
515 
236 
242 
253 
267 
273 
284 
303 
313 
324 
342 
351 
362 
380 
390 
401 
419 
428 
439 
466 
477 
495 
504 
515 
476 
78448 



CYTOPLASMIC (POTENTIAL), 
EGF-LIKE 1. 
EGF-LIKE 2. 
EGF-LIKE 3. 

EGF-LIKE 4, CALCIDM BINDING (POTENTIAL). 
EGF-LIKE 5. 
EGF-LIKE 6. 

EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL) . 
EGF-LIKE 8. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. ' 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
POTENTIAL. 
5A647702 CRC32; 



Query Match 5,5%; Score 627; DB 1; Length 722; 

Best Local Similarity 37,04; Pred. No. 1.71e-103; 

Matches 88; Conservative 55; Mismatches 82; Indels 13; Gaps 9; 

Db 295 KPCRNGATCTNTGOGSYTCSCRPGYTGANCELEVDECAPSPCKNGASC-TD-LEDSFSC 351 

:|| : I |:| I: I :|:: : I ::|||:|::| : ||:| I 
Qy 921 NPCKNDGTCNSDPVDFYRCTCPYGFKGODCDVPIHACISNPCKHGGTCHLKEGEEDGFWC 980 

Db 352 TCPPGFYGKVCELSAMTCADGPCFNGGRCSDNPDGGYTCHCPLGFSGFNCEKKMDLCG-S 410 

: II I II:: II I I : I I : III || :;| || |:|:|: 
Qy 981 ICADGFEGENCEVNVDDCEDNDCENNSTCVDGINN-YTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 411 -SPCSNGAKCVDLGNSYLCRCQAGFSGRYCEDNVDDCASSPCANGGTCRDSVNDFSCTCP 469 

• :H : =11: ::: I I :|: I h : III : I ||: I hll ::| II 
1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

Db 470 PGYTGKNC--SAP-V-SR--CEHAPCHNGATCHQRGQRYMCECAQGYGGPNCQFLLP 520 

I: | |: I : !:: |:||| | | :|:| || | :|: :: 
Qy 1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKLVS 1157 



RESULT 14 

ID NTC4JOUSE STANDARD; PRT; 1964 AA. 
AC P31695; Q62389; 
DT 01-JUL-1993 (REL. 26, CREATED) 
DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 
DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4 PRECURSOR (TRANSFORMING 
DE PROTEIN INT-3). 

NOTCH4 OR INT3 OR INT-3. 
MUS MUSCULUS (MOUSE) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 92194507. 

ROBBINS J., BLONDEL B.J., GALLAHAN D., CALLAHAN R, ; 
"Mouse mammary tumor gene int-3: a member of the notch gene family 



GN 



RX 



RT 



transforms mammary epithelial cells."; 

J, VIROL. 66:2594-2599(1992). 

[2] 

REVISIONS, SEQUENCE FROM N.A. 
CALLAHAN R.; 

SUBMITTED (NOV-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 
[3] 

SEQUENCE FROM N.A. 
TISSUE-LUNG, AND TESTIS; 
MEDLINE; 96281668. 

UYTTENDAELE H., MARAZZI G., WU G., YAN Q., SASSOON D., KITAJEWSKI J.; 

"Notch4/int-3, a mammary proto-oncogene, is an endothelial 
cell-specific mammalian Notch gene."; 
DEVELOPMENT 122:2251-2259(1996). 
-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
■!- DISEASE: ACTIVATED INT-3 TRANSFORMS MAMMARY EPITHELIAL CELLS. 
-!- SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS. 
-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 
-!- SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 
-!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseSisb-sib.ch), 



CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

DR EMBL; M80456; G1714084; -. 

DR EMBL; 043691; G1401160; -. 

DR PIR; A38072; TVMVT3. 

DR MGD; MGI:107471; NOTCH4. 

DR PROSITE; PS00010; ASXJYDROXYL; 11. 

DR PROSITE; PS00022; EGF_1; 28, 

DR PROSITE; PS01186; EGF_2; 21, 

DR PROSITE; PS01187; EGF.CA; 9. 

DR PFAM; PF00008; EGF; 26, 

DR PFAM; PF00023; ank; 6, 

DR PFAM; PF00066; notch; 2. 

DR HSSP; P00740; lira. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 
KW GLYCOPROTEIN; PROTO-ONCOGENE; ANK REPEAT; SIGNAL. 1 



FT 


SIGNAL 


1 


20 


POTENTIAL. 


FT 


CHAIN 


21 


1964 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4. 


FT 


DOMAIN 


21 


1443 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1444 


1464 


POTENTIAL. 


FT 


DOMAIN 


1465 


1964 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


21 


60 


EGF-LIKE 1. 


FT 


DOMAIN 


61 


112 


EGF-LIKE 2. 


FT 


DOMAIN 


115 


152 


EGF-LIKE 3. 


FT 


DOMAIN 


153 ' 


189 


EGF-LIKE 4. 


FT 


DOMAIN 


191 


229 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


231 


271 


EGF-LIKE 6, 


FT 


DOMAIN 


273 


309 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


311 


350 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


352 


388 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


389 


427 


EGF-LIKE 10, 


FT 


DOMAIN 


429 


470 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL), 
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Query Match 5.5%; Score 622; DB 1; Length 1964; 

Best Local Similarity 36.34; Pred, No. 2.20e-102; 

Matches 87; Conservative 55; Mismatches 87; Indels 11; Gaps 10; 

Db 692 CISTPCAHGGTCHPQPSG ■ YNCTCPAGYMGLTCSEEVT ACHSGPCLNGGSCSIRP - - E -G 747 

hi II : Mi::: I MM |::| I : II I II :||:| :: I I 
Qy 918 CLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDG 977 

Db 748 YSCTCLPSHTGRHCQTAVDHCVSASCLNGGTCVNKPGTFFCLCATGFOGLHCEEKTNPSC 807 

: M : I :!: II I | | :|||: ; |||: : | |||| ; | 
Qy 978 FWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLD-FC 1036 

Db 808 A-D-SPCRNKATCQDTPRGARCLCSPGYTGSSCQTLIDLCARKPCPHTARCLQSGPSFQC 865 

I I 'II:: : I ||:| :| | |: :| | : : : |: :; :; 

Qy 1037 AQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTC 1096 

Db 866 LCLQGWTGALCDFPLSCQMAAMSQGIEISGL-CQNGGLCIDTGSSYFCRCPPGFQGKLCO 924 

:| :| :| :|:|: : I ::: : ||||: || : :|:| 1 1 : 1 1 |: 
Qy 1097 ICPEGYSGLFCEFS-P-PMV-LPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 



RESOLT 15 

ID CRBJROME STANDARD; PRT; 2139 AA. 

AC P10040; 

DT 01-MAR-1989 (REL. 10, CREATED) 

DT Ql-MAY-1991 (REL, 18, LAST SEQUENCE UPDATE ) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CRUMBS PROTEIN PRECURSOR (95F), 

GN CRB. 

OS DROSOPHILA MELANOGASTER (FRDIT FLY). 

OC EOKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MOSCOMORPHA; EPHYDROIDEA; 
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OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-OREGON-R; TISSUE-EMBRYO; 

RX MEDLINE; 90263104. 

RA TEPASS 0,, THERES C, KNOSI E.; 

RT "Crumbs encodes an EGF-like protein expressed on apical membranes of 

RT Drosophila epithelial cells and required for organization of 

RT epithelia."; 

RL CELL 61:787-799(1990). 

RN [2] 

RP SEQUENCE OF 1663-1955 FROM N.A. 

RX MEDLINE; 87218537. 

RA KNUST E., DIETRICH 0., TEPASS U., BREMER K.A., WEIGEL D. , 

RA VAESSIN H., CAMPOS-ORTEGA J. A.; 

RT "EGF homologous sequences encoded in the genome of Drosophila 

RT melanogaster, and their relation to neurogenic genes."; 

RL EMBO J. 6:761-766(1987). 

CC •!- FUNCTION: MAY PLAY A ROLE IN THE DEVELOPMENT OF EPITHELIA, 
CC POSSIBLY FOR THE ESTABLISHMENT AND/OR MAINTENANCE OF CELL 

•POLARITY. IT MAY ACT AS A SIGNAL. 
•!■ SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
-!- PTM: PHOSPHORYLATED IN THE CYTOPLASMIC DOMAIN (POTENTIAL). 

CC •!• SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS. 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch), 

CC 

DR EMBL; M33753; G552087; ALT SEQ. 

DR EMBL; X05144; E1746; -. 

DR EMBL; X05144; G929536; -. 

DR PIR; B26637; B26637. 

DR PIR; A35672; A35672. 

DR FLYBASE; FBgn0000368; crb, 

DR PROSITE; PS00010; ASX HYDROXYL; 15. 

DR PROSITE; PS00022; EGF 1; 26. 

DR PROSITE; PS01186; EGF J; 17. 

DR PROSITE; PS01187; EGF_CA; 15. 

DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00054; laminin.G; 3. • 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN; SIGNAL; PHOSPHORYLATION. 
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POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL. 


FT 


CARBOHYD 


198 


198 


POTENTIAL. 


FT 


CARBOHYD 


238 


238 


POTENTIAL. 


FT 


CARBOHYD 


239 


239 


POTENTIAL. 


FT 


CARBOHYD 


336 


336 


POTENTIAL. 


FT 


CARBOHYD 


400 


400 


POTENTIAL, 


FT 


CARBOHYD 


550 


550 


POTENTIAL. 


FT 


CARBOHYD 


565 


565 


POTENTIAL, 


FT 


CARBOHYD 


736 


736 


POTENTIAL, 


FT 


CARBOHYD 


746 


746 


POTENTIAL, 


FT 


CARBOHYD 


860 


860 


POTENTIAL, 


FT 


CARBOHYD 


884 


884 


POTENTIAL. 


FT 


CARBOHYD 


976 


976 


POTENTIAL, 


FT 


CARBOHYD 


1102 


1102 


POTENTIAL. 


FT 


CARBOHYD 


1114 


1114 


POTENTIAL, 


FT 


CARBOHYD 


1138 


1138 


POTENTIAL. 


FT 


CARBOHYD 


1192 


1192 


POTENTIAL, 


FT 


CARBOHYD 


1245 


1245 


POTENTIAL. 


FT 


CARBOHYD 


1255 


1255 


POTENTIAL. 


FT 


CARBOHYD 


1354 


1354 


POTENTIAL. 


FT 


CARBOHYD 


1363 


1363 


POTENTIAL, 


FT 


CARBOHYD 


1441 


1441 


POTENTIAL. 


FT 


CARBOHYD 


1454 


1454 


POTENTIAL. 



te: remainder of annotations omitted, 
uery Match 5,4%; Score 609; DB 1; Length 2139; 

Best Local Similarity 44.44; Pred. No. 1.67e-99; 
Matches 83; Conservative 34; Mismatches 62; Indels 8; Gaps 5; 

Db 659 C -GNGICRNEKGS - YKCYCTPGFTGVHCDSDVDECLSFPCLNGATCHNKI • - • NAYECVC 713 

I H I::: hi Ml I II : |:| || :|;||| | ::; |:| 
Oy 923 CKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWCIC 982 

Db 714 QPGYEGENCEVDIDECGSNPCSNGSTCIDRINNFTCNCIPGMRGRICDIDIDDCVGD--P 771 

!:IHIII|::|:| I I I llhl MM II I :|: :| | | | 
Qy 983 ADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAODLNP 1042 

Db 772 CLNGGQCIDQLGGFRCDCSGTGYEGENCELNIDECLSNPCTNGAKCLDRVKDYFCDCHNG 831 

I : : II 1 1 : 1 1 1 : II ll:|::::|:l I I III I I |: I I I :| 
Oy 1043 CQHDSKCILTPKGFKCDCT - PGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICPEG 1101 

Db 832 YKGKNCE 838 

I I II 
Oy 1102 YSGLFCE 1108 



Search completed: Fri May 28 08:31:05 1999 
Job time : 125 sees, 
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21 


640 


5.7 


642 13 


P79941 


NOTCH LIGAND X-DELTA-2 


2.61e 


96 




22 


649 


5.7 


721 13 


Q91902 


X-DELTA-1. 


4.07e 


98 




23 


648 


5.7 


1193 13 


Q90819 


C-SERATE-1 PROTEIN (FR 


6.46e 


98 


iT\ /Ti i __ i i~Zi i~i i~Zi u \~\ 


24 


647 


5.7 


1203 11 


Q06008 


NOTCH PROTEIN H0M0L0G 


1.03e 


97 


1 IW/M 1 1 1 1 1 1 1 i 1 1 1 1 II II 


25 


625 


5.5 


802 13 


057462 


DELTAA. 


2.66e 


93 


1 1 VV 1 1 1 1 1 1 II 1 l_l 1 II 1 l_l 1 


26 


617 


5.5 


1964 11 


035442 


N0TCH4 . 


1.06e 


91 


II III 1 l_ 1 1 _l 1 1 1 _ 1 


27 


613 


5.4 


717 13 


P87357 


DELTAD TRANSMEMBRANE P 


6,72e 


91 


II 1 1 1 1 1 1 1 1\\ 1 1 1 1 II 


28 


615 


5.4 


752 13 


042374 


NOTCH RECEPTOR PROTEIN 


2.67e 


91 


II 1 1 1 1 _l 1 1 1 \ \ 1 1_ 1 1 1 1 


29 


612 


5.4 


955 4 


Q99466 


N0TCH4 (FRAGMENT), 


1.06e 


90 


LI LI LI 1 1 LI U 1 1 LI LI (TM) 


30 


611 


5.4 


1999 4 


Q99940 


N0TCH4 . 


1.69e 


90 




31 


612 


5,4 


2003 4 


000306 


N0TCH4 . 


l,06e 


90 




32 


595 


5.3 


1476 13 


Q90285 


PUTATIVE EXTRACELLULAR 


2,64e 


87 




33 


591 


5.2 


1202 11 


P97607 


JAGGED2 (FRAGMENT) . 


l,66e 


86 


Release 3.1A John F. Collins, Biocomputing Research Unit. 


34 


591 


5.2 


1372 5 


P91526 


SIMILARITY TO MULTIPLE 


1.66e 


86 


Copyright (c) 1993-1998 University of Edinburgh, U.K. 


35 


575 


5,1 


434 11 


055139 


JAGGED2 PROTEIN (FRAGM 


2.53e 


83 


Distribution rights by Oxford Molecular Ltd 


36 


554 


4.9 


832 5 


Q99108 


NEUROGENIC LOCUS DELTA 


3.71e 


79 


«ch_pp protein - protein database search, using Smith-Waterman algorithm 


37 


553 


4.9 


1687 11 


Q61204 


N0TCH2-LIKE (EGF REPEA 


5,85e 


79 


38 


538 


4.8 


263 4 


Q99734 


NOTCH2 TRANSMEMBRANE P 


5.40e 


76 




39 


533 


4,7 


518 11 


070219 


JAGGED 2 (JAGGED 2 PRO 


5.24e 


75 


Xn on: Fri May 28 08:31:23 1999; MasPar time 86.17 Seconds 


40 


507 


4.5 


473 5 


Q25464 


ADHESIVE PLAQUE MATRIX 


6.79e 


70 


965.924 Million cell updates/sec 


41 


505 


4.5 


1722 5 


Q19350 


SIMILAR TO EGF-LIKE RE 


l,68e 


69 


Tabular output not generated. 


42 


496 


4.4 


585 11 


035675 


M-DELTA-LIKE 3 GENE PR 


9.69e 


68 


Title: >US-09-191-647-2 


43 


494 


4.4 


589 11 


088671 


DELTA 3. 


2.39e 


67 


44 


492 


4.4 


592 11 


088516 


• DELTA- LIKE 3 ALTERNATE 


5.87e 


67 


Description: (1-1525) from US09191647 .pep 


45 


486 


4.3 


1091 11 


P70193 


MEMBRANE GLYCOPROTEIN, 


8.71e 


66 



Perfect Score: 11299 
Sequence: 1 MRGVGWOMLSLSLGLVLAIL SSFVDEVEKWKCGCTRCVS 1525 

Scoring table; 



Searched: 



PAM 150 
Gap 11 

179066 seqs, 54579741 residues 



Post-processing: Minimum Match 04 

Listing first 45 summaries 

Database: sptrembl9 

l:sp_archea 2:sp_bacteria 3:sp_fungi 4:sp_human 
5:sp_invertebrate 6;spjiammal 7:sp_mhc 8:sp_organelle 
9:sp_phage 10:sp_plant ll:sp_rodent 12;sp_unclassified 
13:sp_vertebrate 14 :sp_virus 

Statistics: Mean 54.443; Variance 108.152; scale 0,503 

•Pred. No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 

SUMMARIES 

Result Query 

No. Score Match Length DB ID Description Pred. No. 



1 8111 


71.8 


1523 


11 


088280 


2 7766 


68.7 


1531 


11 


088279 


3 3678 


32.6 


739 


4 


075094 


4 1470 


13.0 


530 


5 


Q24526 


5 1107 


9.8 


601 


5 


Q20204 


6 748 


6.6 


406 


5 


Q25059 


7 723 


6.4 


529 


5 


Q25058 


8 704 


6.2 


2653 


5 


Q25253 


9 687 


6.1 


2531 


5 


016004 


10 680 


6.0 


2470 


11 


035516 


11 666 


5.9 


728 


13 


Q90656 


12 666 


5.9 


2352 


5 


061240 


13 672 


5.9 


2447 


13 


013149 


14 657 


5.8 


615 


13 


057409 


15 659 


5.8 


1212 


13 


042347 


16 652 


5.8 


1218 


4 


015122 


17 650 


5.8 


1218 


4 


014902 


18 650 


5.8 


1218 


4 


Q15816 


19 650 


5.8 


1219 


11 


Q63722 


20 650 


5.8 


1227 


4 


P78504 



MEGF5. 
MEGF4. 

MEGF5 (FRAGMENT). 
SLIT LOCUS ENCODING A 
F40E10.4 PROTEIN (FRAG 
FIBROPELLIN III (FRAGM 
FIBROPELLIN IA (FRAGME 
NOTCH HOMOLOG SCALLOPE 
NOTCH HOMOLOG. 
CELL SURFACE PROTEIN. 
TRANSMEMBRANE PROTEIN 
HRNOTCH PROTEIN, 
NOTCH 2 (FRAGMENT). 
DELTAB. 

C-SERRATE-2 (FRAGMENT) 
JAGGED1. 

TRANSMEMBRANE PROTEIN 
TRANSMEMBRANE PROTEIN 
JAGGED PROTEIN. 
JAGGED 1 (TRANSMEMBRAN 



00e+00 
00e+00 
00e+00 
00e-268 
88e-192 
B8e-11B 
63e-li: 
27e-10! 
95e-10l 
32e-10< 
54e-10: 
54e-10: 
53e-10: 
O0e-99 
96e-100 
01e-98 
56e-98 
56e-98 
56e-98 
56e-98 



RESULT 
ID 
AC 
DT 
DT 
DT 



1 



088280 PRELIMINARY; PRT; 1523 AA. 
088280; 

01-NOV-1998 (TREMBLREL. 08, CREATED) 
Ql-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
MEGF5. 
MEGF5. 

RATTUS NORVEGICUS (RAT). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS . 

[1] 

SEQUENCE FROM N.A. 

STRAIN-S PRAGUE -DAWLEY ; TISSUE-BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M,, NAKAJIMA D., NAGASE T., NOMURA N. , SEKI N. , OHARA 0.; 

"Identification of high-molecular-weight proteins with multiple 

EGF-like motifs by motif-trap screening."; 

GENOMICS 51:27-34(1998), 

EMBL; AB011531; D1033424; -, 

PROSITE; PS01185; CTCK_1; 1. 

PROSITE; PS01186; EGF J; 7. 

PROSITE; PS01187; EGF_CA; 2. 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1523 AA; 167767 MW; 2BD845D0 CRC32; 



Query Match 71.8%; Score 8111; DB 11; Length 1523; 

Best Local Similarity 66,7%; Pred. No. 0.00e+00; 

Matches 1011; Conservative 291; Mismatches 203; Indels 10; Gaps S 

Db 16 LALALALASILSGPPAAACPTKCTCSAASVDCHGLGLRAVPRGIPRNAERLDLDRNNITR 75 

:hl II Ih III: 1 : 1 1 : : : 1 1 1 1 1 1 : 1 1 : 1 1 1 lllhlllll: INN 
Qy 11 LSLGLVLA-ILNKVAPQACPAQCSCSGSTVDCHGLALRSVPRNIPRNTERLDLNGNNITR 69 

Db 76 ITKMDFTGLKNLRVLHLEDNQVSVIERGAFQDLKQLERLRLNKNKLQVLPELLFQSTPKL 135 

III !I:||::|IM:| :| :| 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 : 1 l|::|llll :hll 
Qy 70 ITKTDFAGLRHLRVLQLMENKISTIERGAFQDLKELERLRLNRNHLQLFPELLFLGTAKL 129 

Db 136 TRLDLSENQIQGIPRKAFRGVTGVKNLQLDNNHISCIEDGAFRALRDLEILTLNNNNISR 195 

lllllllll|:|MIIIII :IHIII l:lllllllllllllll|:||||||||:| 
Qy 130 YRLDLSENQIQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITR 189 

Db 196 ILVTSFNHMPKIRTLRLHSNHLYCDCHLAWLSDWLRQRRTIGQFTLCMAPVHLRGFSVAD 255 

: I:lllllll:||:lllll:lllllllllllllll I :| :| Ihl III! :||: 
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Db 8! 
Qy 8! 



Qy 9; 

Db io: 

Qy io: 

Db 10! 

Qy 10! 

Db 11! 

Qy 11! 

Db 12: 

Qy 12: 

Db 12; 

Qy 12: 



LSVASFNHMPKLRTFRLHSNNLYCDCHLAWISDWLRKRPRVGLYTQCMGPSHLRGHNVAE 24 9 

16 VQKKEYVCPGPH-S-EA-PACNANSLSCPSACSCSNNIVDCRGKGLTEIPANLPEGIVEI 312 

llhhlh : :: I : : I 1 1 : 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 I 1 1 
iO VQKREFVCSDEEEGHQSFMAPSCSVLHCPAACTCSNNIVDCRGRGLTEIPTNLPETITEI 309 

,3 RLEQNSIKSIPAGAFIQYKKLKRIDISKNQISDIAPDAFQGLKSLTSLVLYGNKITEIPK 372 

l!MI:l! Ihlll llt:lll:|:IMI::MIIIII!:| 1 1 1 1 1 1 1 1 1 1 1 : 1 1 
.0 RLEQNTIKVIPPGAFSPYKKLRRIDLSNNQISELAPDAFQGLRSLNSLVLYGNKITELPK 369 

3 GLFDGLVSLQLLLLNANKINCLRVNTFQDLQNLNLLSLYDNKLQTISKGLFAPLQSIQTL 432 

hi! !!lll!llllllllllh:!l!!:!llllllllllllll;ll hlh:|lh 
0 SLFEGLFSLQLLLLNANKINCLRVDAFQDLHNLNLLSLYDNKLQTIAKGTFSPLRAIQTM 429 

13 HLAQNPFVCDCHLKWLADYLQDNPIETSGARCSSPRRLANKRISQIKSKKFRCSGSEDYR 492 

IIMIIMMMMMMM MIIIIMIMMIIIIIMMIIIIIIMIIMM 
10 HLAQNPFICDCHLKWLADYLHTNPIETSGARCTSPRRLANKRIGQIKSKKFRCSGTEDYR 489 

3 NRFSSECFMDLVCPEKCRCEGTIVDCSNQKLSRIPSHLPEYTTDLRLNDNDIAVLEATGI 552' 

:::|::M II llllllllll llllllll::|| 1 : 1 : 1 1 : : 1 1 1 1 : 1 : : : 1 1 1 1 1 1 1 
0 SKLSGDCFADLACPEKCRCEGTIVDCSNQKLNKIPEHIPQYTAELRLNNNEFTVLEATGI 549 

3 FKKLPNLRKINLSNNRIKEVREGAFDGAAGVQELMLTGNQLETMHGRMFRGLSGLKTLML 612 

Mill 11111:111:1 :: lllhlhll |::||:|:|| :: :||:|| :|||||| 
0 FKKLPQLRKINFSNNKITDIEEGAFEGASGVNEILLTSHRLENVQHKMFKGLESLKTLML 609 

.3 RSNLISCVNNDTFAGLSSVRLLSLYDNRITTISPGAFTTLVSLSTINLLSNPFNCNCHMA 672 
III I : f I I: lllllimill|:|||::|||l II 1 1 1 1 : 1 II : II 1 1 1 1 1 :| 
RSNRITCVGNDSFIGLSSVRLLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLA 669 

WLGRWLRKRRIVSGNPRCQKPFFLKEIPIQDVAIQDFTCE-GNEENSCQLSPRCPEQCTC 731 
III llll:|l|:|lllllll:||||lllllllllllll: ll::HI :||| ;||| 
WLGEWLRKKRIVTGNPRCQKPYFLKEIPIQDVAIQDFTCDDGNDDNSCSPLSRCPTECTC 729 

12 VETWRCSNRGLHTLPKGMPKDVTELYLEGNHLTAVPKELSTFRQLTLIDLSNNSISMLT 7 9 1 

"llllllhll 1 1 1 1 : 1 : 1 1 1 1 1 1 1 : 1 1 : : | Mill :::||||||||| || |: 
LDTWRCSNKGLKVLPKGIPRDVTELYLDGNQFTLVPKELSNYKHLTLIDLSNNRISTLS 7 8 9 

12 NHTFSNMSHLSTLILSYNRLRCIPVHAFNGLRSLRVLTLHGNDISSVPEGSFNDLTSLSH 851 
|::IHI::| 1 1 1 1 1 1 1 1 1 1 1 1 1 : : 1 : 1 1 : 1 1 1 : 1 : 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 :: 1 1 1 
NQSFSNMTQLLTLILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISVVPEGAFNDLSALSH 849 

i2 LALGIHPLHCDCSLRWLSEWIKAGYKEPGIARCSSPESMADRLLLTTPTHRFQCKGPVDI 911 

1:1 III |||:::|||:|:|: |||||llll::| llhlllllh :| I llll: 
iO UUGANPLYCDCNMQWLSDWVKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDV 909 

.2 NIVAKCNACLSSPCKNNGTCSQDPVEQYRCTCPYSYKGKDCTVPINTCVQNPCQHGGTCH 971 
ll:||||:IM:||||:|l|: III: lllllll::ll II lll::|: III Mil 
NILAKCNPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCH 969 

'2 LSESHRDGFSCSCPLGFEGQRCEINPDDCEDNDCENSATCVDGINNYACVCPPNYTGELC 1031 

I I: III I I: Mil: Mi 1 1 1 1 1 1 1 1 1 1 :: 1 1 1 1 1 1 1 1 1 : 1 : 1 1 1 : 1 1 1 1 1 1 
'0 LKEGEEDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGIHNYTCLCPPEYTGELC 1029 

12 DEVIDYCVPEMNLCQHEAKCISLDKGFRCECVPGYSGKLCETDNDDCVAHKCRHGAQCVD 1091 

MM ::| |||::||| |||:|:| ||| I |: I III MMMIM I 
10 EEKLDFCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTD 1089 

2 AVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCIWQQEPTCRCPPGFAG 1151 

IMIMIIMMMMIM Mill MMII ::|||||||!l || IM ||: I 
10 AVNGYTCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQG 1149 

>2 PRCEKLITVNFVGKDSYVELASAKVRPQANISLQVATDKDNGILLYKGDNDPLALELYQG 1211 

MIIMMIM |:||::::|||||||:||:||:||| 1 : 1 1 1 1 1 1 1 1 : 1 MMIM 
iO EKCEKLVSVNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDRDHIAVELYRG 1209 

2 HVRLVYDSLSSPPTTVYSVETVNDGQFHSVELVMLNQTLNLVVDKGAPKSLGKLQKQPAV 1271 

M ||: I i : : : : II III: MMM Mill: M ||::: 

.0 RVRASYDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSTL 1269 

2 GINSPLYLGGIPTSTGLSALRQGADRPLGGFHGCIHEVRINNELQDFKALPPQSLGVSPG 1331 

"1111:11:1 : :::ll|:: : :|||||::: MUM :| |: |: || 
'0 NFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQT-GILPG 1328 



Db 1332 CKSC- -TVCRHGLCRSVEKDSWCECHPGWTGPLCDQEAQDPCLGHSCSHGTCVATGN- S 1388 

I M || || h: : |||: I |||||| : MMM | MM:; 

Qy 1329 CEPCHKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTKDPCLGNKCVHGTCLPINAFS 1388 

Db 1389 YVCKCAEGYEGPLCDQKNDSANACSAFKCHHGQCHISDRGEPYCLCQPGFSGNHCEQENP 1448 

I III II I III: :l hi Ml II MM Ml I :|::|: MM : 

Qy 1389 YSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPYCECSSGYTGDSCDREIS 1448 

Db 1449 CLGEIVREAIRRQKDYASCATASKVPIMVCRGGC-GSQCCQPIRSKRRKYVFQCTDGSSF 1507 

I II M: :M ||:| |: M : Mill hill MMM! hlllllll 

Qy 1449 CRGERIRDYYQKQQGYAACQTTKKVSRLECRGGCAGGQCCGPLRSKRRKYSFECTDGSSF 1508 



1508 VEEVERHLECGCREC 1522 

Mil: : HI | 

1509 VDEVEKWKCGCTRC 1523 



RESULT 2 

ID 088279 PRELIMINARY; PRT; 1531 AA. 

AC 088279; 

DT 01 -NOV- 1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01 -NOV- 1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF4 . 

GN MEGF4 , 

OS RATTDS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N. , SEKI N., OHARA 0.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif-trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011530; D1033423; -. 

DR PROSITE; PS01185; CTCKJ; 1, 

DR PROSITE; PS01186; EGFJ; 8. 

DR PROSITE; PS01187; EGF.CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

SQ SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

Query Match 68.7%; Score 7766; DB 11; Length 1531; 

Best local Similarity 65.3%; Pred, No. 0.00e+00; 

Matches 984; Conservative 284; Mismatches 229; Indels 11; Gaps 7; 

Db 28 RLGAT ACPALCTCTGTTVDCHGTGLQAI PKNI PRNTERLELNGNNITR I HKNDFAGLKQL 87 

:::: llll l:|:|:llllll : I : : :| : 1 1 1 1 1 1 1 1 1 : 1 1 1 1 II 1 1 1 I MM:: 
Qy 22 KVAPQACPAQCSCSGSTVDCHGLALRSVPRNIPRNTERLDLNGNNITRITKTDFAGLRHL 81 

Db 88 RVLQLMENQIGAVERGAFDDMKELERLRLNRNQLQVLPELLFQNNQALSRLDLSENSLQA 147 

1 1 1 1 1 1 1 1 M:MIMMMIIIMMMIIM|::!MI I MMII Ml 
Qy 82 RVLQLMEMKISTIERGAFQDLKELERLRLNRNHLQLFPELLFLGTAKLYRLDLSENQIQA 141 

Db 148 VPRKAFRGATDLKNLQLDKNQISCIEEGAFRALRGLEVLTLNNNNITTIPVSSFNHMPKL 207 

: 1 1 1 S 1 1 1 1 MMII llllllhlllllll MIIMMIM ::|:|||||||| 
Qy 142 IPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKL 201 

Db 208 RTFRLHSNHLFCDCHLAWLSQWLRQRPTIGLFTQCSGPASLRGLNVAEVQKSEFSCSGQG 267 

1 1 1 1 1 1 1 1 : 1 : 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1. : 1 1 : 1 1 1 II: III 1 1 1 1 1 1 1 II II : 
Qy 202 RTFRLHSNNLYCDCHLAWLSDWLRKRPRVGLYTQCMGPSHLRGHNVAEVQKREFVCSDEE 261 

Db 268 EAAQV-PACTLSSGSCPAMCSCSNGIVDCRGKGLTAIPANLPETMTEIRLELNGIKSIPP 326 

hi I : I III hill llllllllll Ihllllhllllll I II III 
Qy 262 EGHQSFMAPSCSVLHCPAACTCSNNIVDCRGKGLTEIPTNLPETITEIRLEQNTIKVIPP 321 

Db 327 GAFSPYRKLRRIDLSNNQIAEIAPDAFQGLRSLNSLVLYGNKITDLPRGVFGGLYTLQLL 386 

1 1 1 M : 1 1 1 1 M M 1 1 1 1 : 1 : 1 1 1 1 1 1 1 : I ! : : : ||::|||| 

Qy 322 GAFSPYKKLRRIDLSNNQISELAPDAFQGLRSLNSLVLYGNKITELPKSLFEGLFSLQLL 381 

Db 387 LLNANK I NC I RPDAFQDLQNLSLLSLYDNKIQSLAKGTFT SLRAIQT LHLAQNP FICDCN 446 
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MIIIIIMM 

Qy 382 LLNANR INCLRVDAFQDLHNLNLLSLYDNRLQT I AKGTFS PLRAIQTMHLAQNPFICDCH 441 

Db 447 LKWLADFLRTNPIETTGARCASPRRLANKRIGQIKSKKFRCSAKEQYFIPGTEDYHLNSE 506 

Qy 442 LKWLADYLHTNPIETSGARCTSPRRLANKRIGQIKSKKFRCSGTEDY-RSK— -LSGD 495 

Db 507 CTSDVACPHKCRCEASWECSGLKLSKIPERIPQSITELRLNNNEISILEAIGLFKKLSH 566 

I MMM Mil!:: hll IIMIIIMII I M 1 1 1 1 1 1 1 : : : 1 1 1 1 1 : 1 1| I ; ; 
Qy 496 CFADLACPERCRCEGTTVDCSNQRLNRIPEHIPQYTAELRLNNNEFTVLEATGIFRKLPQ 555 

Db 567 LKRINLSNNKVSEIEDGTFEGATSVSELHLTANQLESVRSGMFRGLDGLRTLMLRNNRIS 626 

I:|||:||||:::||:|:||||::|:|: l|:|:||:|: 1 1 : 1 1 : : 1 : 1 1 1 1 1 : 1 1 1 : 
Qy 556 LRKINFSNNKITDIEEGAFEGASGVNEILLTSNRLENVQHKMFRGLESLRTLMLRSNRIT 615 

Db 



t 



Qy 



Db 



627 CIHNDSFTGLRNVRLLSLYDNHITTISPGAFDTLQALSTLNLLANPFNCNCQLAWLGDWL 686 

I: MM II :|llimil:|||::|||llll::MIIIIIIIIIII!l |||||:|| 
616 CVGNDSFIGLSSVRLLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLAWLGENL 675 

687 RRRKIVTGNPRCQNPDFLRQIPLQDVAFPDFRCEEGQEEVGCLPRPQCPQECACLDTWR 746 

l::Mlllllll: ||::||:||||: || |::| :: :| | ::| ||:||||||| 
676 RKKRIVTGNPRCQKPYFLKEIPIQDVAIQDFICDDGNDDNSCSPLSRCPTECTCLDTWR 735 



747 CSNKHLQALPKGIPKNVTELYLDGNQFTLVPGQLSTFKYLQLVDLSNNKISSLSNSSFTN 806 

MM I MMM::|||IMIMIIMM Ml M I IMIIIIMIMII MM 

Qy 736 CSNKGLKVLPRGIPRDVTELYLDGNQFTLVPKELSNYKHLTLIDLSNNRISTLSNQSFSN 795 

Db 807 MSQLTTLILSYNALQCIPPLAFQGIRSLRLLSLHGNDVSTLQEGIFADVTSLSHLAIGAN 866 

IMI IIIMII 1 : 1 1 i I MMIMMIIIIIIIIM : II I ! : : : I ! 1 1 ! 1 1 1 ' 
Qy 796 MTQLLTLILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISWPEGAFNDLSALSHLAIGAN 855 

Db 867 PLYCDCHLRWLSSWVKTGYREPGIARCAGPPEMEGRLLLTTPAKRFECQGPPSLAVQARC 926 

M 1 1 1 1 : : : M I III: 1 1 M f 1 1 1 1 1 1 j II IIMMIMM MM : : III 
Qy 856 PLYCDCNMQWLSDWVKSEYKEPGIARCAGPGEMADRLLLTTPSKKFTCQGPVDVNILARC 915 

Db 927 DPCLSSPCQNQGTCHNDPLEVYRCTCPSGYRGRNCEVSLDSCSSNPCGNGGTCHAQEGED 986 

MIIIMI 1 : 1 1 1 : : 1 1 : r III |:||::|:|:: : Mil MIMI III: 
Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCRHGGTCHLKEGEE 975 

Db 987 AGFTCSCPSGFEGLTCGMNTDDCVKHDCVNGGVCVDGIGNYTCQCPLQYTGRACEQLVDF 1046 

M I I: MM I M III Ml I : Mill MM II MM lh Ml 

Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 1047 CSPDLNPCQHEAQCVGTPEGPRCECVPGYTGDNCSRNQDDCRDHQCQNGAQCVDEINSYA 1106 

• I: Mill!!:: I: II I MM III |:;| : III |: I MIM I :|:|: 
1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 
1107 CLCAEGYSGQLCEI • PP- A- PRNS -CEGTECQNGANCVDQGSRPVCQCLPGFGGPECERL 1162 
IMMIIII Ml: II II I |: MIMI I: : : |:||||||: | |l| 
Qy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCERL 1155 

Db 1163 LSVNFVDRDTYLQFTDLQNWPRANITLQVSTAEDNGILLYNGDNDHIAVELYQGHVRVSY 1222 

:M|I:::::|||: :|::|||||::| 1 1 M 1 1 1 1 M I : II I II II I : I M I II 
Qy 1156 VSVNFINKESYLQIPSAKVRPQTNITLQIAIDEDSGILLYKGDKDHIAVELYRGRVRASY 1215 

Db 1223 DPGSYPSSAIYSAETINDGQFHTVELVTFDQMVNLSIDGGSPMTMDNFGRHYTLNSEAPL 1282 

I II IMIIII MUM II lll::MI : : 1 1 : 1 [ I : I : : |::|: III ::|| 
Qy 1216 DTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPRIITNLSKQSTLNFDSPL 1275 

Db 1283 YVGGMPVDVNSAAFRLWQILNGTSFHGCIRNLYINNELQDFTKTQMKPGWPGCEPCRKL 1342 

MIMI I MM IMMIIMMIIMMIIM I I I : : ! 1 1 1 1 1 : i 
Qy 1276 YVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPGCEPCHKK 1335 

Db 1343 YCLHGICQPNATPGPVCHCEAGWGGLHCDQPVDGPCHGHKCVHGKCVPLDALAYSCQCQD 1402 

I M MM: M I I: II I III : II IMIIII I : I : : I : : 1 1 1 I : 
Qy 1336 VCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFSYSCKCLE 1395 

Db 1403 GYSGALCNQVGAVAEPCGGLQCLHGHCQASATRGAHCVCSPGFSGELCEQESECRGDPVR 1462 

! M II" : Ml :•: I II |: |: : I ||:|::|: |::| |||: :| 
Qy 1396 GHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPYCECSSGYTGDSCDREISCRGERIR 1455 

Db 1463 DFHRVQRGYAICQTTRPLSWVECRGACPGQGCCQGLRLRRRRLTFECSDGTSFAEEVEKP 1522 

I: : IMM MM: : I : : 1 1 1 1 : 1 : 1 II II llll MIIMIMI Mill 



Qy 1456 DYYQRQQGYAACQTTKRVSRLECRGGCAGGQCCGPLRSKRRRYSFECTDGSSFVDEVEKV 1515 

Db 1523 TKCGCAPC 1530 

MM: I 
Qy 1516 VKCGCTRC 1523 



RESULT 3 

ID 075094 PRELIMINARY; PRT; 739 AA. 
075094; 

01-NOV-1998 (TREMBLREL. 08, CREATED). 
01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
MEGF5 (FRAGMENT). 
MEGF5. ' 

HOMO SAPIENS (HUMAN) . 

EURARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 
CATARRHINI; HOMINIDAE; HOMO, 
ID 

SEQUENCE FROM N.A. 
TISSUE-BRAIN; 
MEDLINE; 98360089. 

NARAYAMA M., NARAJIMA D,, NAGASE T., NOMURA N., SERI N., OHARA 0.; 
"Identification of high-molecular-weight proteins with multiple 
EGF-like motifs by motif -trap screening."; 
GENOMICS 51:27-34(1998). 
EMBL; AB011538; D1033429; -. 
PROSITE; PS01185; CTCRJ; 1. 
PROSITE; PS01186; EGFJ; 7. 
PROSITE; PS01187; EGF.CA; 2. 
GLYCOPROTEIN; EGF-LIKE DOMAIN. 
NONJIER 1 1 

739 AA; 80364 MW; DC6BCB63 CRC32; 



Query Match 32.6%; Score 3678; DB 4; Length 739; 

Best Local Similarity 59.8*; Pred. No. 0.00e+00; 

Matches 444; Conservative 156; Mismatches 137; Indels 5; Gaps 4 

Db 1 NSISMLTNYTFSNMSHLSTLILSYNRLRCIPVHAFNGLRSLRVLTLHGNDISSVPEGSFN 60 

I M IM : 1 1 1 1 : : 1 1111111111111 : : I M I M II M M 1 1 1 1 1 1 IMIMI 
Qy 783 NRISTLSNQSFSNMTQLLTLILSYNRLRCIPPRTFDGLRSLRLLSLHGNDISWPEGAFN 842 

Db 61 DLTSLSHLALGTNPLHCDCSLRWLSEWVKAGYKEPGIARCSSPEPMADRLLLTTPTHRFQ 120 

ll:MIIIIM:IM |||:::|||:|||: ||||||III:M IIIMIIIII: M 
Qy 843 DLSALSHLAIGANPLYCDCNMQWLSDWVKSEYREPGIARCAGPGEMADRLLLTTPSKRFT 902 



121 CKGPVDINIVARCNACLSSPCKNNGTCTQDPVELYRCACPYSYKGRDCTVPINTCIQNPC 180 

I IIIIMIMIIIMIIMIIIMM I M : : 1 1 1 : 1 1 1 : : 1 1 II 1 1 1 : : 1 1 III 
903 CQGPVDVNILAKCNPCLSNPCKNDGTCNSDPVDFYRCTCPYGFRGQDCDVPIHACISNPC 962 



Db 181 QHGGTCHLSDSHRDGFSCSCPLGFEGQRCEINPDDCEDNDCENNATCVDGINNYVCICPP 240 

Mlllll III I I: Mil: Ihl 11111111111:111111111 IMM 
Qy 963 RHGGTCHLKEGEEDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPP 1022 

Db 241 NYTGELCDEVIDHCVPELNLCQHEAKCIPLDRGFSCECVPGYSGRLCETDNDDCVAHRCR 300 

MIIIMM M I Ml 1 1 1 :: 1 1 1 III Ml III I |: I III :||: 
Qy 1023 EYTGELCEEKLDFCAQDLNPCQHDSRCILTPRGFRCDCTPGYVGEHCDIDFDDCQDNRCK 1082 

Db 301 HGAQCVDTINGYTCTCPQGFSGPFCEHPPPMVLLQTSPCDQYECQNGAQCIWQQEPTCR 360 

MIM I :: 1 1 1 1 1 ! I : I : M Ml MIMI MIMI ::IMIMIII II I: 
Qy 1083 NGAHCTDAVNGYTCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQ 1142 

Db 361 CPPGFAGPRCERLITVNFVGRDSYVELASAKVRPQANISLQVATDRDNGILLYRGDNDPL 420 

I lh I MllhMII: I M I :: : M I II 1 1 1 M I M I M 1 1 1 : 1 1 1 1 1 1 1 1 : 1 : 
Qy 1143 CLPGYQGEKCEKLVSVNFINRESYLQIPSARVRPQTNITLQIATDEDSGILLYKGDRDHI 1202 

Db 421 ALELYQGHVRLVYDSLSSPPTTVYSVETVNDGQFHSVELVTLNQTLNLWDKGTPKSLGR 480 

IMIIMMI II: I I : : : : I J I M : 1 1 1 II 1 1 1 : : I : I : I : I II I || : : 
Qy 1203 AVELYRGRVRASYDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITN 1262 

Db 481 LQKQPAVGINSPLYLGGIPTSTGLSALRQGTDRPLGGFHGCIHEVRINNELQDFKALPPQ 540 

I II::: "IMIMIM : :::|||: : MIMI::: Ihlllll M I 
Qy 1263 LSKQSTLNFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQRVPMQ 1322 
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Db 541 SLGVSPGCKSC-TVCKHGLCRSVEKDSWCECRPGWTGPLCDQEARDPCLGHRCHHGKC 598 

: h III :l II II I:: : III: II llllll : llll|::| II I 
Qy 1323 T-GILPGCEPCHKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGTC 1381 

Db 599 VATGT-SYMCKCAEGyGGDLCDNKNDSANACSAFKCHHGQCHISDQGEPYCLCQPGFSGE 657 

- : II III II II III: :| hi hll II |::| hilt I :|::|: 
Qy 1382 LPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPYCECSSGYTGD 1441 

Db 658 HCQQENPCLGQWREVIRRQKGYASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQ 716 

I::| H h -\- "I llhl I: II: :|||||| I III I I ! ! I ' 1 1 |: 
Qy 1442 SCDREI SCRGERIRDY YQKQQG YAACQTTKKVSRLECRGGCAGGQCCGPLRS KRRKYSFE 1501 

Db 717 CTDGSSFVEEVERHLECGCLAC 738 

11111111:111: : III I 
Qy 1502 CTDGSSFVDEVEKWKCGCTRC 1523 



RESULT 4 

ID Q24526 PRELIMINARY; PRT; 530 AA, 

•Q24526; 
01-NOV-1996 (TREMBLREL. 01, CREATED) 
01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE SLIT LOCUS ENCODING A PROTEIN ASSOCIATED WITH NEURAL DEVELOPMENT WITH 

DE 52D EGF HOMOLOGOUS DOMAINS (FRAGMENT), 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=CANTON S; 

RX MEDLINE; 89077533. 

RA ROTHBERG J.M., HARTLEY D.A., WALTHER Z,, ART AVANI S - T SAKONAS S.; 

rt "slit: an EGF -homologous locus of D. melanogaster involved in the 

RT development of the embryonic central nervous system,"; 

RL CELL 55:1047-1059(1988). 

DR EMBL; M23543; G514357; -. 

DR FLYBASE; FBgn0003425; sli, 

DR PROSITE; PS01186; EGF J; 5, 

DR PROSITE; PS01187; EGF_CA; 2. 

DR PFAM; PF00008; EGF; 7, 

DR PFAM; PF00054; laminin.G; 1, 

KW NEUROGENESIS; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 530 530 

SQ SEQUENCE 530 AA; 59457 MW; 10E5764D CRC32; 



.ery Match 13,0%; Score 1470; DB 5; Length 530; 

ist Local Similarity 39, It; Pred. No. 3 .OOe-268; 
.tches 206; Conservative 132; Mismatches 168; Indels 21; Gaps 17; 

Db 1 MKDKLILSTPSSSFVCRGRVRNDILAKCNACFEQPCQNQAQCVALPQREYQCLCQPGYHG 60 

I llhhlll I hi I :llllll:|: II |:: I : I |:| I |: I 
Qy 888 MADKLLLTTPSKKFTCQGPVDVNILARCNPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKG 947 

Db 61 KHCEFMIDAC YGNPCRNNATCTVLE * • EGRFSCQCAPGYTGARCET NIDDCLGEIKCQNN 118 

I: I II Hll:: Ml : I I I II h I |:||| : |:|| 
Qy 948 QDCDVPIHACISNPCKHGGTCHLKEGEEDGFWCICADGFEGENCEVNVDDC-EDNDCENN 1006 

Db 119 ATCIDGVESYKCECQPGFSGEFCDTKIQFCSPEFNPCANGAKCMDHFTHYSCDCQAGFHG 178 

:||:lh::| I I I ::||:|: |::||: ::||| : :||: : III :|: I 
Qy 1007 STCVDGINNYTCLCPPEYTGELCEEKLDFCAQDLNPCQHDSKCILTPKGFKCDCTPGYVG 1066 

Db 179 T NCTDNI DDCQ NHMCQNGGTCVDG INDYQC RC PDDYTGKYCEGHNMI SMMYPQT SPCQNH 238 

:| ::lll|:::| II: I |::| I I ||: |:| :|| : :|: |:||||:| 
Qy 1067 EHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICPEGYSGLFCE-FSP-PMVLPRTSPCDNF 1124 

Db 239 ECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTRPEANVTI 298 

:| :| I : :: :|:| III I II I I : : I : : : : | : : : : : : ||::|:|: 
Qy 1125 DgQNGA-QCIVRINEPICQCLPGYQGEKCEKLVSVNFINKESYLQIPSAKVRPQTNITL 1182 

Db 299 VFSSGQN-GILMYDGQDAHLAVELFNGRIRVSYDVGNHPVSTMYSFEMVADGKYHAVELL 357 



::: llhl h hlllh Ihl III hll h:ll I : ll::l 1 1 1 1 
Qy 1183 QIATDEDSGILLYKGDKDHIAVELYRGRVRASYDTGSHPASAIYSVETINDGNFHIVELL 1242 

Db 358 AIKKNFTLRVDRGLARSIINEGSNDYLRLTTPMFLGGLPVDPAQQAYKNWQIRNLTSFKG 417 

h :::| II I :: I I : |:: :|:::||:| : : : :| III I 
Qy 1243 ALDQSLSLSVDGGNPKIITNLSKQSTLNFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHG 1302 

Db 418 CMKEVWINHKLVDFGNAQRQQKITPGCAL-LEGE-QQEE-EDDEQD-FM-D-ET-PHI 468 

I:::: II III : II III : : I I : I : 

Qy 1303 CIRNLYINSELQDFQKVPMQTGILPGCEPCHRKVCAHGTCQPSSQAGFTCECQEGWMGPL 1362 

Db 469 KEEPV-DPCLENKCRRGSRCVPNSNARDGYQCKCKHGQRGRYCDQGE 514 

:: MM III :h hi II :| III |: I lh I 
Qy 1363 CDQRTNDPCLGNKCVHGT ■ CLPI * NAF - SYSCKCLEGHGGVLCDEEE 1406 



RESULT 5 

ID Q20204 PRELIMINARY; PRT; 601 AA. 

AC Q20204; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE F40E10.4 PROTEIN (FRAGMENT). 

GN F40E10.4, 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE;' PELODERINAE; CAENORHABDITIS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RA SMYER.; 

RL SUBMITTED (FEB-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718. 

RA WILSON R,, AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M. , 

RA BONFIELD J., BURTON J,, CONNELL M. , COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M., DEAR S., DU Z., DURBIN R., FAVELLO A., FULTON L., 

RA GARDNER A., GREEN P., HAWKINS T., HILLIER L., JIER M., JOHNSTON L., 

RA JONES M,, KERSHAW J., KIRSTEN J., LAISTER N. , LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D,, SHOWNKEEN R,, 

RA' SMALDON N,, SMITH A., SONNHAMMER E,, STADEN R., SULSTON J., 

RA THIERRY-MIEG J,, THOMAS K., VAUDIN M., VAUGHAN K., WATERSTON R,, 

RA WATSON A., WEINSTOCK L,, WILKINSON- SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans . " ; 

RL NATURE 368:32-38(1994). 

DR EMBL; Z69792; E1346469; -. 

DR PROSITE; PS01187; EGF.CA; 1. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 601 AA; 66669 MW; F8A72773 CRC32; 

Query Match 9.8%; Score 1107; DB 5; Length 601; 

Best Local Similarity 34.7%; Pred. No. 3.88e-192; 

Matches 182; Conservative 124; Mismatches 186; Indels 32; Gaps 16; 

Db 1 IKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTKLATKCDLCLNSPCKNNAIC 60 

Ml : hlllll I ::: III: : Mh: I :: :||: 1 1 : : 1 1 1 1 : : I 
Qy 870 VKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDVNILAKCNPCLSNPCKNDGTC 929 . 

Db 61 ETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATC-KVAQAGRFNCYCNKGFEGD 118 

:: : I I I II I h I II ::M : :|| I :: III Mlh 
Qy 930 NSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWCICADGFEGE 989 

Db 119 YCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEGKHCEDKLE 178 

II hill :: III : III : Ihl I MM I 1 1 : 1 1 : 

Qy 990 NCEVNVDDCEDNDCENNSTCVD G • • INNYTCLCPPEYTGELCEEKLD 1034 

Db 179 YCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSY 238 

:h lllh:::lll :: I h!h |::|: ::||l : llhl h: :| 
Qy 1035 FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 
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Db 239 DCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQG-ECVASQNSSDFTCKCHEGFSGP 297 

1:1 I: :||:;| I ; :| :|: I :|: | : : | | |: | 
Qy 1095 TCICPEGYSGLFCEFSPPMVL--PRTSPCDNFDCQNGAQCIVRINEP-I-CQCLPGYQGE 1150 

Db 298 SCDRQMSVGFKHPGAYLAL-DP-LASDGTITMTLRTTSKIGILLYYGDDHFVSAELYDGR 355 

I- :ll II :H : : : :: II: : I Mill II :: III II 
Qy 1151 KCEKLVSVNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYRGR 1210 

Db 356 VKLVYYIGNFPASHMYSSVKVNDGLPHRISIRTSERKCFLQIDKNPVQIVENSGKSDQLI 415 

I: I I: III :H :lll I : : : :: I :| |: I :| I 
Qy 1211 VRASYDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKQSILN 1270 

Db 416 TKGKEMLYIGGLPIEKSQDMRRFHVKNSESLKGCISSITINEVPINLQQALENVNTEQS 475 

: 1 1 : 1 1 : 1 : : I: |: |: III :: II : :||: :::| : : 
Qy 1271 FDS - P - L? VGGMPG KSNVASLRQAPGQNGT SFHGC IRNLYI N - - S - ELQD - FQKVPMQTG 1324 

Db 476 CSATVNFCAGIDCGNGKCTNNALSPKGYMCQCDSHFSGEHCDEK 519 
: I |::| I I I |:|: I II:: 
1325 ILPGCEPCHKKVCAHGTC-QPS-SQAGFTCECQEGWMGPLCDQR 1366 



PRT; 406 AA, 



RESULT 6 

ID Q25059 PRELIMINARY; 

AC Q25059; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN III (FRAGMENT) . 

OS HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN) . 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

OC ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS . 

RN [1] 

RP SEQUENCE FROM N.A. 

RA BISGROVE B.W.; 

RL SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; L33862; G499688; -. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01186; EGFJ; 6. 

DR PROSITE; PS01187; EGF_CA ; 5. 

DR PFAM; PF00008; EGF; 7. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

FT NONJER 1 1 

SQ SEQUENCE 406 AA; 43475 MW; 45E6EE2C CRC32; 

Query Match 6.6%; Score 748; DBS; Length 406 ; 

•est Local Similarity 42.3%; Pred. No. 3,88e-118; 
atches 101; Conservative 47; Mismatches 76; Indels 15; Gaps 11; 

Db 14 DDCNPNPCQNGAACI-DQVNDYECICPPGFTGDNCETDIDVCASAPCRNGGAC-V-DGV- H 

: I Mil I ::l I I: I I II II |::|: I I I ||::||:| : :| 
Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 70 NGYTCNCIPGFDGDNCENNINECASNPCQNGGVCIDGVNGFVCTCQPGYTGTLCETDIDE 129 

:h I I Ihhlll |:::|' I hi : 1 : 1 1 : 1 : I I I III III :| 
Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 130 CA-S-NPCQNGGVCTDLVNM-YTCDCLAGFTGSNCETNINECASNPCLNGGACVDGVNGY 186 

II llll: : I I : : III :|: I :|: ::::| I I ||: I Mill 
Qy 1036 CAQDLNPCQHDSKCI-LTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 

Db 187 VCQCLPNYTGTHCEIS-—LD-V--CQSMPCQNGATCTNVGGDYSCECPPGYTGINCE 238 

I I hi 11:1 I I:: Mill I : hi III I :|| 
Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 



RESULT 7 

ID Q25058 PRELIMINARY; PRT; 529 AA. 
AC Q25058; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN IA (FRAGMENT), 

OS " HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN) , 



EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS. 

[1] 

SEQUENCE FROM N.A. 



RL SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; L33861; G499686; -. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01186; EGF.2; 10. 

DR PROSITE; PS01187; EGF.CA; 7. 

DR PFAM; PF00008; EGF; 10. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 529 AA; 55543 MW; 6385F322 CRC32; 

Query Match 6.4%; Score 723; DB 5; Length 529; 

Best Local Similarity 40.2%; Pred. No. 4.63e-113; 

Matches 96; Conservative 49; Mismatches 79; Indels 15; Gaps 9; 

Db 137 DECASSPCLNGGQCI-NRINSYECVCAAGFNGVNCQTNIDECASDPCENGGIC-I-AGV- 192 

: I 1:11 III : :: I I I: Ihl :|: I I hll :|| I : I 
Qy 916 NPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEE 975 

Db 193 NGYTCNCASGYTGTNCETEIDECASMPCLNGGQCIEMVNGYTCQCAAGFTGVLCETDIDE 252 

:|: 1111:1 III ::|:| I I : |:: :| III |:: :|| III :| 
Qy 976 DGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDF 1035 

Db 253 CASD--PCQNGGVCTDTVNGYICSCVQGFTGSDCETNINECASGPCQNGGTCVDGVNGFV 310 

II I III: : I I :h I I hi I: ::::! I II: I |:|||: 
Qy 1036 CAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYT 1095 

Db 311 CQCPPNYTGTYCEIS---L— DACSSMPCQNGATC-VNVGANYICECPPGFAGQNCE 361 

I II hi :ll:l I :| : lllll I I : l|:| l|: I : : 1 1 
Qy 1096 CICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEP-ICQCLPGYQGEKCE 1153 



RESULT 8 

ID Q25253 PRELIMINARY; PRT; 2653 AA. 

AC Q25253; 

DT 01-NOV-1996 (TREMBLREL, 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH HOMOLOG SCALLOPED WINGS (SCL). 

GN SCL. 

OS LUCILIA CUPRINA (GREENBOTTLE FLY) (AUSTRALIAN SHEEP BLOWFLY) . 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; OESTROIDEA; CALLIPHORIDAE; 

OC LUCILIA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-SS SEEKING; 

RX MEDLINE; 96400928. 

RA DAVIES A.G., GAME A.Y., CHEN Z,, WILLIAMS T.J., GOODALL S., YEN J.L., 

RA MCKENZIE J. A., BATTERHAM P.; 

RT "Scalloped wings is the Lucilia cuprina Notch homologue and a 

RT candidate for the modifier of fitness and asymmetry of diazinon 

RT resistance,"; 

RL GENETICS 143:1321-1337(1996). 

RN [2] 

RP SEQUENCE OF 39-265 FROM N.A. 

RC STRAIN-SS SEEKING; 

RA CHEN Z., NEWSOME T., MCKENZIE J. A., BATTERHAM P.; 

RL SUBMITTED (DEC-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [3] 

RP SEQUENCE OF 39-265 FROM N.A. 

RC STRAIN-SS SEEKING; 

RA CHEN Z., MCKENZIE J. A., BATTERHAM P.; 

RL SUBMITTED (NOV- 1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U58977; G1389670; -. 

DR EMBL; AF032672; G2654074; -. 

DR EMBL; AF032670; G2654074; JOINED. 

DR EMBL; AF032671; G2654074; JOINED. 

DR EMBL; AF032673; G2654075; -. 
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DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS01186; EGFJ; 28. 

DR PROSITE; PS01187; EGFJA; 21. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023 ; ank; 6. 

DR PFAM; PFQ0066; notch; 3. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SO SEQUENCE 2653 AA; 285928 MW; 8F35FD2D CRC32; 

Query Match 6.2%; Score 704; DB 5; Length 2653; 

Best Local Similarity 28.6%; Pred. No. 3.27e-109; 

Matches 159; Conservative 131; Mismatches 215; Indels 51; Gaps 45; 

Db 392 DACTSNPCHADAICDTSPINGSYTCPCATGYKGVDCSEDIDECDQGSPCEHNGVC-VN-T 449 

::! III! : |:: |:: I I I: |:|| II I I ::|| I I I :: 
Qy 916 NPCLSNPCKNDGTCNSDPVD-FYRCTCPYGFKGQDCDVPIHACI-SNPCKHGGTCHLKEG 973 

Db 450 P-GSFRCNCSQGFTGPRCETNINECESHPCQNEGSCLDDPGTFRCVCMPGFTGTQCEIDI 508 

:M h:ll I II l:::ll : hl:::|:| : |:| I :|l II : 
Qy 974 EEDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKL 1033 

•509 NEC-QS-NPCLNGGICNDMINGFKCSCALGFTGSRCQINIDDCQSQPCRNNGICRDSIAG 566 
:M III : : I :!ll! I: I: I :|:|::|||| |:| : I |:: I 
Qy 1034 DFCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNG 1093 

Db 567 YTCQCPPGYTGLSCEIN--INDCNSNPCHRGKCIDGDNRFTCVCDPGFTGYLCQTQINEC 624 

III II Ihll II::- : ::|| I :| : : :| ; I I : I 
Qy 1094 YTCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEP-ICQCLPGYQGEKC 1152 

Db 625 ESNPCQYGGHCVDRVGSYMCHCLAGTSGKDCEINVN-ECHSNPCNNGATCIDGINKYTCQ 683 

I : : : ::: ||: : :::: : : |: : :: ::| | :| 
Qy 1153 E-KLV--SVNFINK-ESYL-Q-IPSAKVRP-QTNITLQIATDE-DSGILLYKG-DKDHIA 1203 

Db 684 CVPGFTGVHCEININECASNPCANNGVCMDLVNGYKCECPRGFYDPRCLSDVDECASNPC 743 

I :. I : : : :|:| |: :: :| : : : || : ::|| 
Qy 1204 -VELYRG-RVRAS-YDTGSHP-ASAIYSVETINDGNFHIVELLALDQSLSLSVD-GGNPK 1258 

Db 744 INGGRCEDGINEFICHCPPGY-GGKRCENDIDECSSNPCQHG-GF--CVDELNAFKCQCM 799 

I :: :| I I lh ::: I |:| :| |: :| :: : : 
Qy 1259 IITNLSKQSTLNF--DSPL-YVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLY-INSE-L 1313 

Db 800 PGYTGLKCETNI--D-D-CINNPCANGGTCIDKVN-GYKCVCKVPYTGQDCESKL-DPCA 853 

: : :| I : I :: Ihl II |: I I I |: : III 
Qy 1314 QDFQKVPMQTGILPGCEPCHKKVCAHG-TCQPSSQAGFTCECQEGWMGPLCDQRTNDPCL 1372 

Db 854 TNRCRNDAKCTPSPNFLDFSCTCKLGYTGRYCDEDID-E-CKLSTPCRNGATCH-NVPG 909 

1:1 : : I I I =11 I I I III: I : I : |::| h : I 
Qy 1373 GNKCVHGT-CLPINAF-SYSCKCLEGKGGVLCDEEEDLFNPCQ-AIKCKHG-KCRLSGLG 1428 

•910 -SYRCICAKGYEGHDC 924 
:| I I: III I 
Qy 1429 QPY-CECSSGYTGDSC 1443 



RESULT 9 

ID 016004 PRELIMINARY; PRT; 2531 AA. 

AC 016004; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT OWAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH HOMOLOG . 

OS LYTECHINUS VARIEGATUS (SEA URCHIN) , 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

OC ECHINACEA; TEMNOPLEUROIDA; TOXOPNEUSTIDAE; LYTECHINUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97454256. 

RA SHERWOOD D.R., MCCLAY D.R.; 

RT "Identification and localization of a sea urchin Notch homologue; 

RT insights into vegetal plate regionalization and Notch receptor 

RT regulation . " ; 

RL DEVELOPMENT 124:3363-3374(1997). 

DR EMBL; AF000634; G2570351; •. 



DR PROSITE; PS00010; ASXJYDROXYL; 21. 

DR PROSITE; PS0U86; EGFJ; 25. 

DR PROSITE; PS01187; EGF.CA; 20. 

DR PFAM; PF00008; EGF; 34. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 2531 AA; 273982 MW; BB9C6F3D CRC32; 

Query Match 6,1%; Score 687; DB 5; Length 2531; 

Best Local Similarity 40.3%; Pred. No. 8.95e-106; 

Matches 96; Conservative 54; Mismatches 73; Indels 15; Gaps 11; 

Db 631 CSSNPCVNDGTC-VDGINEYTCMCHEGYRGLNCEEDIDDCESRPCHNGGTC-V-D-EVNG 686 

I llll IIM I :; I I I I:: :|: I I I II :|||| : : I :| 
Qy 918 CLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDG 977 

Db 687 YHCLCPIGYHDPFCMSNINECSSNPCVNGGSCHDGVNEYSCECMAGYTGTRCTDDFDECS 746 

: 1:1: I: I |:::| II:: ||:|:|:| : III | ; ;| |; 
Qy 978 FWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCA 1037 

Db 747 -S-NPCQHGGTCDNRHAFYNCTCQAGYTGLNCEVNIDDCVDEPCLNGGICIDEVNSFQCV 804 

Mill : I "I I :|l I :|::::lll I: I lh I I lh: h 
Qy 1038 QDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCI 1097 

Db 805 CPQTFVGLLCE-TE R-SPCEDNQCQNGATCVYSEDYAGYSCRCTSGFQGNFCD 855 

II: : Ihll : I III:: :IMII h : hi :hll: |: 
Qy 1098 CPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRIN-EPI-CQCLPGYQGEKCE 1153 



RESULT 10 



ID 


035516 PRELIMINARY; PRT; 2470 AA. 




AC 


035516; 




DT 


01-JAN-1998 (TREMBLREL. 05, CREATED) 




DT 


01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 




DT 


01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 




DE 


CELL SURFACE PROTEIN. 




GN 


NOTCH2. 




OS 


MUS MUSCULUS (MOUSE). 




OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 


OC 


SCIUROGNATHI; MURIDAE; MURINAE; MUS. 




RN 


[1] 




RP 


SEQUENCE FROM N.A. 




RC 


STRAIN-C57B/6; TISSUE-THYMUS; 




RX 


MEDLINE; 93178563. 




RA 


LARDELLI M., LENDAHL O.; 




RT 


"Motch A and motch B--two mouse Notch homologues coexpres 


sed in a 


RT 


wide variety of tissues."; 




RL 


EXP. CELL RES. 204:364-372(1993). 




RN 


[2] 




RP 


SEQUENCE FROM N.A. 




RC 


STRAIN-C57B/6; TISSUE=THYMUS; 




RA 


HAMADA Y. , HIGUCHI M., TSUJIMOTO Y, ; 




RL 


SUBMITTED (JUL-1994) TO EMBL/GENBANK/DDBJ DATA BANKS . 




DR 


EMBL; D32210; D1022953; -. 




DR 


PROSITE; PS00010; ASXJYDROXYL; 22. 




DR' 


PROSITE; PS01186; EGFJ; 27. 




DR 


PROSITE; PS01187; EGF_CA; 22. 




DR 


PFAM; PF00008; EGF; 34. 




DR 


PFAM; PF00023; ank; 6. 




DR 


PFAM; PF00066; notch; 2. 




KW 


GLYCOPROTEIN; EGF-LIKE DOMAIN. 




SQ 


SEQUENCE 2470 AA; 265325 MW; CA94E03A CRC32; 





Query Match 6.0%; Score 680; DB 11; Length 2470; 

Best Local Similarity 29.5%; Pred. No. 2.32e-104; 

Matches 162; Conservative 116; Mismatches 220; Indels 51; Gaps 40; 

Db 184 NECDIPGRCQHGGTCLNLPGS-YRCQCGQGFTGQHCDSP-YV-RGLPCVNGGTCR-QTGD 239 

I I :: I : III : I llll II II II I : II : 1 1 1 1 : |: 
Qy 916 NPC-LSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGE 974 

Db 240 FT-LECNCLPGFEGSTCERNIDDCPNHKCQNGGVCVDGVNTYNCRCPPQWTGQFCTEDVD 298 . 
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: I I MM II hill :: hi lllhl I I |||: II;; | :| 
Qy 975 EDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLD 1034 

Db 299 ECLLQPNACQNGGTCTNRNGGYGCVCVNGWSGDDCSENIDDCAYASCTPGSTCIDRVASF 358 

I : hll: : I h I I I h I ::||| I hill:; 
Qy 1035 FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 

Db 359 SCLCPEGKAGLLCH L - - DDAC ISNPCHKGALCDTNPLNGQ YICTCPQGYKGADC 410 

:hllll :lhl I :| : h:|l I : I 111 III ! 

Qy 1095 TC ICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQC - IVRINEP - ICQCLPGYQGEKC 1152 

Db 411 TEDVDECAMANSNPC - E-HAGKC ■ VNTDGAFHCEC -LK -G ■ • • YAGPRCEMDINECHSDP 462 

I : I :: : ::| h ::: III:::: :: 
Qy 1153 - EKLVS VNF I NKESY LQ I PSAKVRPQTNIT LQ IATDEDSG I LL YKGDKDHIAVELYRGRV 1211 

Db 463 CQN-DATC LDK IGGFTCLCMP - G - FKG VHCELEVNECQSNPC - VNNGQCVDKVNRFQCLC 518 

f: I: : :: : | | | | ::: | : | : :::: | 
1212 RASYDTGSHPASAIYSVETINDGNFHIVE-LLALDQSLSLSVDGGNPKIITNLSKQSTLN 1270 
519 "PPGFTGPVC-QIDIDDCSSTPCLNGAKCIDH-PNGY-ECQCATGFTGILCDENI-DNC 572 
:| : I : :: :| lh I I : : I : : I I 
Qy 1271 FDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSE-LQDFQKVPMQTGILPGC 1329 

Db 573 DP-DP- -CHHGQCQDGIDS-YTCICNPGYMGAICSDQIDE-CYSSPCLNDGRCIDLVN-G 626 

:| Ml II : :: :|| I I lh:| :: :: I :: h: I h : : 
Qy 1330 EPCHKKVCAHGTCOPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVH-GTCLPINAFS 1388 

Db 627 YQCNCQPGTSGLNC-EIN-FDDCASNPCMHGVC-VDGINR-YSCVCSPGFTGQRCNIDI 681 

I hi I :|: I I : I: I : hll I : |: : I I 1 1 : 1 :| I : h :l 
Qy 1389 YSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPY-CECSSGYIGDSCDREI 1447 

Db 682 DECASNPCR 690 

I :: I 
Qy 1448 S-CRGERIR 1455 



RESULT 11 

ID Q90656 PRELIMINARY; PRT; 728 AA. 

AC 090656; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE TRANSMEMBRANE PROTEIN C-DELTA-1. 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

•NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 



SEQUENCE FROM N.A. 
TISSUE=SPINAL CORD; 
MEDLINE; 95319507. 

2 D., ADAM J., 



MYAT A., CHITNIS A,, LEWIS J,, ISH-HOROWICZ E 
"Expression of a Delta homologue in prospective neurons in the 
chick."; 

NATURE 375:787-790(1995). 
)R EMBL; U26590; G882412; -, 
)R PROSITE; PS01186; EGF_2; 8. 
)R PROSITE; PS01187; EGF CA; 2. 
)R PFAM; PF00008; EGF; 6. 
W GLYCOPROTEIN; EGF-LIKE DOMAIN, 
3Q SEQUENCE 728 AA; 79861 MW; 7439F575 CRC32; 

Query Match 5.9%; Score 666; DB 13; Length 728; 

Best Local Similarity 39.5%; Pred. No. l,54e-101; 

94; Conservative 50; Mismatches 81; Indels 13; Gaps 



Db 303 KPCKNGATCTNTGQGSYTCSCRPGYTGSSCEIEINECDANPCKNGGSC--TD-LENSYSC 359 

:||ll :M : I hi h I h: h I :lllhlhl : h:: I 
Qy 921 NPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFWC 980 

Db 360 TCPPGFYGKNCELSAMTCADGPCFNGGRCTDNPDGGYSCRCPLGYSGFNCEKKIDYCS-S 418 

I: II I llh: II I I : I I : hi II hi II hhh 
Qy 981 ICADGFEGENCEVNVDDCEDNDCENNSTCVDGINN-YTCLCPPEYTGELCEEKLDFCAQD 1039 



419 -SPCANGAQCVDLGNSYICQCQAGFTGRHCDDNVDDCASFPCVNGGTCQDGVNDYSCTCP 477 

' ill : : h ::: hi :|: I III : III I lh I hll hi II 
1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

478 PGYNGKNC- -STP-V-SR- - -CEHNPCHNGATCHERSNRYVCECARGYGGLNCQFLLP 528 

11:1 I I I I :| h: hill I I I :|:| II I :|: |:: 
1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEKLVS 1157 



JLT 12 

061240 PRELIMINARY; PRT; 2352 AA. 
061240; 

01-AUG-1998 (TREMBLREL. 07, CREATED) 
Ql-AUG-1998 (TREMBLREL. 07, LAST SEQUENCE UPDATE) 
01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 
HRNOTCH PROTEIN. 
HRNOTCH. 

HALOCYNTHIA RORETZI (SEA SQUIRT) , 

EUKARYOTA; METAZOA; CHORDATA; UROCHORDATA; ASCIDIACEA; STOLIDOBRANCHIA; 

PYURIDAE; HALOCYNTHIA. 

[1] 

SEQUENCE FROM N.A. 

HORI S., SAITOH T., MATSUMOTO M., MAKABE K.W., NISHIDA H.; 

DEV. GENES EVOL. 207:371*380(1997). 

EMBL; AB001327; D1026501; 

PROSITE; PS00010; ASXJYDROXYL; 18. 

PROSITE; PS01186; EGFJ; 22. 

PROSITE; PS01187; EGF_CA; 18. 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

2352 AA; 252623 MW; 816976D4 CRC32; 



Query Match 5,9%; Score 666; DB 5; Length 2352; 

Best Local Similarity 26.7%; Pred. No. 1.54e-101; 

Matches 147; Conservative 131; Mismatches 226; Indels 47; Gaps 38; 

Db 756 SPCVPNPCENGATCQ-ESADYLAYVCQCPEGFRGPTCATDINECVNSPCKNGGGC--TN- 811 

. :lh:IM I =11 :: h I I II 1 1 : 1 I h h::|lhll I : 
Qy 916 NPCLSNPCKNDGTCNSDPVDF--YRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEG 973 

Db 812 LVPGYQCTCSQGFTGKDCDTDIDDCSSNPCLNGGQCLDDVGSYKCLCLPGFEGNNCQEEV 871 

I: I h:|| I :|: ::III I I I : h I : : I 1 1 1 I : h hi : 
Qy 974 EEDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKL 1033 

Db 872 NECA-SF-PCKNGGICTDYVNSYVCTCLSGFYSLDCEKNIEDCSSSSCMNGGTCVDGINS 929 

: II : II : : I ::: I I :|: : h :::|| : hlh I |::|: 
Qy 1034 DFCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNG 1093 

Db 930 YSCSCTANFTGDKCQ-NA--V---NN-CASLOCQNGGTCYYDSGDPKCACVHGYTGTHCE 982 

1:1 I ::| h :: I : I :::||||: I :| I h III II 
Qy 1094 YTCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 

Db 983 SLQNLC - TG PNIC - KNGGSCVQT SNTVSCNCLGGY EGTDCAVPQ VSCT - VGAS LLG I AVS 1039 

I :: : :: |: :: I : : :: I I 

Qy 1154 KLVSVNFINKESYLQIPSAKVRPQTNITLQ-IATDEDSGILLYKGDKDHIAVELYRGRVR 1212 

Db 1040 DLCLNGGTCHDTSTABECSCVAGFTGSYCDI-DIDECASVPCKNGATCNDLINSYSCICA 1098 

: I: I :|: : :: :|: |:: :|: : :| : | : 

Qy 1213 A-SYDTGS-HPASAIYSVETINDGNFHIVELLALDQSLSLSV-DGGNPK-IITNLSKQST 1268 

Db 1099 LGYEGATC ■ • -LTDKDECAS - - - SPCKNGGTCIDRINSFYCSC - LAGTEGV- L-CEI - NE 1148 

I :::: : I : II :| II : I ::| : I : I : I 
Qy 1269 LNFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTGILPG 1328 

Db 1149 -DECEINICLNGGVCI-DGIGGFSCQCPSGYEGRRCQGDVNE-CLSNPCSSPGSLACIQG 1205 

: I -I :l I : : 1 1 : 1 : 1 llh h 11 : 1 I h I : 
Qy 1329 CEPCHKKVCAHG-TCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHGTCLP-INA 1386 

Db 1206 SNSYQCVCDADYTGSEC-QIR-I-GSCDINPCLNDGICTDNSQDITGYKCQCTWGYYGKK 1262 

II I I I I : : :h I : I I :: I hh II I 
Qy 1387 F - S YSCKCLEGHGGVLCDEEEDLFNPCQ AIKC - KHGKCRLSGLGQP - Y - CECSSG YTGDS 1442 

Db 1263 CENSYSMCSAN 1273 
I: II:: 
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Qy 1443 CDREIS-CRGE 1452 



RESULT 13 

ID 013149 PRELIMINARY; PRT; 2447 AA. 

AC 013149; 

DT 01-JUL-1997 (TREMBLREL, 04, CREATED) 

DT 01-JOL-1997 (TREMBLREL. 04, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH 2 (FRAGMENT). 

OS FUGU RUBRIPES (JAPANESE PUFFERFISH) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGI I ; NEOPTERYGI I ; 

OC TELEOSTEI; EUTELEOSTEI; ACANTHOPTERYGII; PERCOMORPHA; 

OC TETRAODONTIFORMES; TETRAODONTOIDEI; TETRAODONTIDAE; FUGU. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA NAKAMURA T., TROWSDALE J,; 

RL SUBMITTED (JUN'1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AB004829; D1021371; -, 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

•PROSITE; PS01186; EGF 2; 29. 
PROSITE; PS01187; EGF_CA; 20. 
PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 1 1 

SO SEQUENCE 2447 AA; 262542 MW; 3CDA4F7A CRC32; 

Query Match 5.9%; Score 672; DB 13; Length 2447; 

Best Local Similarity 29.1%; Pred. No. 9.53e-103; 

Matches 163; Conservative 115; Mismatches 234; mdels 48; Gaps 43; 

Db 332 DACISKPCKGGSKCDTNPISGMFNCNCPSGYTGSTCSIDRDECSIGTNPCEHGGQC-VN- 389 

::|:|:MI : h-h : I II |: I I : II :||| III I :: 
Qy 916 NPCLSNPCKNDGTCNSDPVD • FYRCTC PYGFKGQDCDVPI HAC - 1 • SNPCKHGGTCHLKE 972 

Db 390 TE-GSFTCNCAKGYAGPRCEQDVNECASNPCQNDGTCLDRIGDYSCICMPGFGGTHCENE 448 

I M I II I: I II :|::| I |:|::||:| I :|:|:| I : I ||: 
Qy 973 GEEDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEER 1032 

Db 449 LNECL-S-SPCLNRGKCLDQVSRFVCECPAGFSGEMCQIDIDECSSTPCLNGAKCIDLPN 506 

I: I HI : :ll: I hi :|: I |:||:|:| MINI I 
Qy 1033 LDFCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVN 1092 

Db 507 GYDCECAEGFKGLLCEENINDCVPEPCHHGQCKDGIATFSCECYAGYTGAICNIQVQECH 566 

II I 1:11: 11:11 :: : I : : I : : :| :|| : : 
Qy 1093 G YTC ICPEGYSGLFCE ■ ■ FSPPMVLP • RTS PCDNFDCQNG AQC IVRINEP ICQC - LPG YQ 1148 



f 



567 SNPCQNRGRCIDLVN-AYQCNCPPGISGVNCEINEDDCASNLCVYGECQDGINEYKCVC 624 

:: I: : ::::| :| |:: :| : : : | :: | 

1149 GEKCE-KLVSVNFINKESYL-QIPSAKVRPQTNITLQIATDEDSGILLYK-GDKDHIAV- 1204 



Db 



625 SPGYTGDKCDVDINECSSNPCMSGGTCVDNVN-G-FHCLCPPSTYGLLCLSGTDHCVAQP 682 
II: : :hl I: h :| I II : : 1111:1' 
Qy 1205 EL-YRG-RVRAS-YDTGSHP-ASAIYSVETINDGNFHIVELLALDQSLSLS-VD-GGNP 1257 

Db 683 CVHGKCIEQQNGYFCQCEAGWVGQHCEQEKDECL-PNPCQNGGSCLD-RHNGFTCVCQAG 740 

: : I I : || : I I III I :| : 
Qy 1258 KIITNLSKQSTLNF-DSPL-YVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQD 1315 

Db 741 YRGVNCEKNI--D-E-CTSGPCLNQGICI-DGLNSYTCQCVPPFAGEHCEVEL-DPCSSR 794 

:: I : I II I :| I ; ::||:| | |: ||| : 
Qy 1316 FQKVPMQTGILPGCEPCHKKVC-AHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGN 1374 

Db 795 PCQRGGVCLPSADYTYFTCRCPAGWQGLHCSE--DV-NECKKNPCRNGGHC-INSPG-SY 849 

I \ III ::l :hl I I: I I hi h: | ::: | ;| 
Qy 1375 KCVHG-TCLPINAFSY-SCKCLEGHGGVLCDEEEDLFNPCQAIKCKHG-KCRLSGLGQPY 1431 

Db 850 ICKCPSGYSGHNCQTDIDDC 869 

I 1 : 1 1 1 : J :|: :| I 
Qy 1432 -CECSSGYTGDSCDREIS-C 1449 



RESULT 14 

ID 057409 PRELIMINARY; PRT; 615 AA, 

AC 057409; 

DT 01-JUN-1998 (TREMBLREL. 06, CREATED) 

DT Ql-JUN-1998 (TREMBLREL, 06, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE DELTAB, 

GN DELTAB. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI ; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 98165391. 

RA HADDON C, SMITHERS L., SCHNEIDER -MAUNOURY S. ( COCHE T., HENRIQUE D. , 

RA LEWIS J.; 

RT "Multiple delta genes and lateral inhibition in zebrafish primary 

RT neurogenesis. "; 

RL' DEVELOPMENT 125:359-370(1998), 

DR EMBL; AF006488; G2772825; -, 

DR PROSITE; PS01186; EGF J; 7. 

DR PROSITE; PS0U87; EGF_CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 615 AA; 67592 MW; FC6348F8 CRC32; 

Query Match 5.8%; Score 657; DB 13; Length 615; 

Best Local Similarity 38.1%; Pred, No, 1.00e-99; 

Matches 91; Conservative 56; Mismatches 76; Indels 16; Gaps 11; 

Db 274 CTNHKPCANGATCTNTGQGSYTCTCRPGFGGTNCELEINECDCNPCKNGGSCN--DLEND 331 

I :: II I :N : I III II I :|:: h I ||||:||:|: : |:| 

Qy 918 CLSN-PCRNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEED 976 

Db 332 -YSCTCPQGFYGKNCEIIAMTCADDPCFNGGTCEEKFTGGYVCRCPPTFTGSNCEKRLDR 390 

: I h:ll I III: I h I I :|| : : I I III :|l II :|| 
Qy 977 GFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINN-YTCLCPPEYTGELCEERLDF 1035 

Db 391 CSH--KPCANGGECVDLGASAL-CRCRPGFSGSRCETNIDDCARYPCQNAGTCQDGINDY 447 

I" :|| : : I: I : :: I I II: I :|: ::||| I |:: I |::| I 
Qy 1036 CAQDLNPCQHDSKCI-LTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGY 1094 

Db 448 TCTCTLGFTGKNCSLRADACL- - TNPC • ■ L ■ • HGGT - CFTHFSGPVCQCVPGFMGSTCE 499 

II I l::l I : : I hll : : |: |: ::: 1 : 1 1 1 : 1 1 : I II 

Qy 1095 TCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCE 1153 



RESULT 15 

ID 042347 PRELIMINARY; PRT; 1212 AA. 

AC 042347; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE C-SERRATE-2 (FRAGMENT). 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 97184054. 

RA HAYASHI H., MOCHII M,, RODAMA R., HAMADA Y. , MIZUNO N., EGUCHI G., 

RA TACHIC; 

RT "Isolation of a novel chick homolog of Serrate and its coexpression 

RT with C- Notch- 1 in chick development."; 

RL INT. J. DEV. BIOL. 40:1089-1096(1996), 

DR EMBL; D87558; D1022568; -. 

DR PROSITE; PS01186; EGF.2; 10. 

DR PROSITE; PS01187; EGF.CA; 8, 

DR PFAM; PF00008; EGF; 14. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 1212 AA; 134188 MW; 0ECFO76C CRC32; 
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Query Match 5.8%; Score 659; DB 13; Length 1212; 

Best Local Similarity 39.64; Pred, No. 3.96e-100; 

Matches 93; Conservative 48; Mismatches 79; Indels 15; Gaps 10; 

Db 285 HPCLNGGTCMNT EPDE - YRCACPDGYSGKNCEIAEHACVSNPCANGGTCH - - E - ISSSFK 340 

:|| I III : lll:|l |: I :|::: NIMH :|IMI I :| 
Qy 921 NPCKNDGTC-NSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEEDGFW 979 

Db 341 CHC PSGWSG PTC AID IDECAS NPC AQG GTC IDH I NS FEC ICPQQWIGATCQLDANEC - EG 399 

MM I I :::hl I I :lhl II:: |:|| : I I: : I : 
Qy 980 CICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTGELCEEKLDFCAQD 1039 

Db 400 -KPCVNAYSCKNLIGGYYCDCIPGWKGVNCHININDCHGQ-CQHGGTCKDEVNDYHCICP 457 

:M : I I: III II I :| MM: I :|: I I II I Nil 
Qy 1040 LNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVNGYTCICP 1099 

t458 RGFTGKNCE I-E-TNECESNPCQNGGRCKDLVNGFTCLCAQGFSGVFCEM 505 
II : I: I" llll::l :| MM ||: 
1100 EGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGEKCEK 1154 



Search completed: Fri May 28 08:35:44 1999 
Job time : 261 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

irch_pp protein • protein database search, using Smith-Waterman algorithm 

Fri May 28 08:41:25 1999; MasPar time 6.83 Seconds 

326.882 Million cell updates/sec 



Tabular output not generated. 

Title: MJS-09-191-647-3 

Description: (1-105) from OS09191647. pep 

Perfect Score: 730 

Sequence: 1 SPCTCSNNIVDCRGKGLMEI ITEIAKGLFDGLVSLOLLLL 105 

Scoring table: PAM 150 
Gap 11 

Searched: 170751 seqs, 21266608 residues 

Post-processing: Minimum Match 0* 

Listing first 45 summaries 

Database: a-geneseq35 

1: parti 2:part2 3:part3 4:part4 5:part5 6:part6 7:part7 
8:part8 9:part9 10:partl0 lhpartll 12:partl2 13:partl3 
14:partl4 15:partl5 16:partl6 17:partl7 18;partl8 
19:partl9 20:part20 21:part21 22:part22 23:part23 
24:part24 25;part25 26:part26 27:part27 28:part28 
29:part29 30:part30 31:part31 32:part32 33:part33 
34;part34 35:part35 36:part36 37:part37 38:part38 
39:part39 

^^tistics: Mean 28,979; Variance 136.917; scale 0.212 

Pred, No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query' 



SUMMARIES 



Score 


Match Length DB 


ID 


Description 


Pred, 


No. 


581 


79.6 


1534 30 


W46966 


Amino acid sequence o 


2.48e 


40 


459 


62.9 


1480 5 


R25079 


Drosophila SLIT prote 


1.22e 


29 


234 


32.1 


345 23 


W09405 


Pineal gland specific 


1.95e 


10 


224 


30.7 


692 2 


R08038 


Rat testicular lutein 


1.30e 


09 


223 


30.5 


332 15 


R87953 


Bovine neurotrophic b 


1.57e 


09 


223 


30.5 


368 1 


R05159 


Sequence of human bon 


1.57e 


09 


221 


30.3 


369 15 


R87951 


Rat neurotrophic bigl 


2.30e 


09 


221 


30.3 


369 15 


R87952 


Human neurotrophic bi 


2.30e 


09 


221 


30.3 


634 6 


R30520 


N-terminal of LH rece 


2.30e 


09 


221 


30.3 


689 6 


R30509 


K-terminal of LH rece 


2.30e 


09 


221 


30.3 


692 6 


R30503 


N-terminal of LH rece 


2.30e 


09 


221 


30.3 


695 6 


R30506 


N-terminal of LH rece 


2.30e 


09 


221 


30.3 


695 6 


R30525 


N-terminal of LH rece 


2.30e 


09 


221 


30.3 


695 6 


R30524 


N-terminal of LH rece 


2.30e 


09 


221 


30.3 


696 6 


R30523 


N-terminal of LH rece 


2.30e 


09 


221 


30.3 


698 6 


R30505 


N-terminal of LH rece 


2.30e 


09 



17 


219 


30.0 


690 6 


R30514 


N-terminal of LH rece 


3.35e 


09 


18 


219 


30.0 


693 6 


R30510 


N-terminal of LH rece 


3.35e 


09 


19 


219 


30.0 


696 6 


R30521 


N-terminal of LH rece 


3.35e 


09 


20 


219 


30.0 


696 6 


R30526 


N-terminal of LH rece 


3.35e 


09 


21 


219 


30.0 


696 6 


R30513 


N-terminal of LH rece 


3.35e 


09 


22 


219 


30.0 


699 6 


R30515 


N-terminal of LH rece 


3.35e 


09 


23 


217 


29.7 


390 20 


W06532 


Gonadotropin receptor 


4.89e 


09 


24 


217 


29.7 


695 21 


W14782 


FSH receptor. 


4.89e 


09 


25 


217 


29,7 


695 8 


R42082 


FSH receptor. 


4.89e 


09 


26 


216 


29.6 


196 5 


R29102 


Drosophila SLIT prote 


5,91e 


09 


27 


216 


29.6 


695 5 


R27558 


FSHR. 


5.91e 


09 


28 


214 


29.3 


1091 27 


W41641 


Sequence used in dete 


8.61e 


09 


29 


211 


28.9 


620 6 


R30522 


N-terminal of LH rece 


l.Sle 


08 


30 


211 


28.9 


696 6 


R30519 


N-terminal of LH rece 


1.51e 


08 


31 


209 


28.6 


904 39 


W86351 


Human DNAX toll-like 


2.20e 


08 


32 


198 


27.1 


605 17 


R85888 


WD-40 domain-contg. i 


1.72e 


07 


33 


193 


26.4 


837 39 


W86361 


Human DNAX toll-like 


4.37e 


07 


34 


190 


26.0 


560 12 


R71294 


Human glycoprotein V. 


7.63e 


07 


35 


188 


25.8 


139 8 


R42263 


Decorin sequence PT-7 


l.lOe 


06 


36 


188 


25.8 


186 8 


R42264 


Decor in sequence PT-7 


l.lOe 


06 


37 


188 


25.8 


234 8 


R42265 


Decorin sequence PT-7 


l.lOe 


06 


38 


188 


25.8 


280 8 


R42266 


Decorin sequence PT-7 


l.lOe 


06 


39 


188 


25.8 


305 8 


R42267 


Decorin sequence PT-7 


l.lOe 


06 


40 


188 


25.8 


331 8 


R42260 


Mature decorin PT-65. 


l.lOe 


06 


41 


188 


25.8 


342 17 


R89439 


Human recombinant dec 


l.lOe 


06 


42 


188 


25.8 


353 1 


R05160 


Sequence of human bon 


l.lOe 


06 


43 


188 


25.8 


1388 18 


R89471 


Collagen/decorin fusi 


l.lOe 


06 


44 


187 


25.6 


603 17 


R85889 


WD-40 domain-contg. r 


1.33e 


06 


45 


182 


24.9 


799 39 


W86352 


Human DNAX toll-like 


3.35e 


06 



RESULT 
ID 



Key 



W46966 standard; Protein; 1534 AA, 
W46966; 

06- JUL-1998 (first entry) 

Amino acid sequence of a human slit-like polypeptide. 

Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

cancer; antibody. 

Homo sapiens. 

Location/Qualifiers 
1. .26 

/note- "signal peptide" 
Protein 27.. 1534 

/note- "mature protein" 

J10087699-A. 

07- APR-1998. 

15- JUL-1997; 205351. 

16- JUL-1996; JP-186219. 

(AS AH ) ASAHI KASEI KOGYO KK. 
WPI; 98-267127/24. 
N-PSDB; V16978, 

Human Slit-like protein - useful for diagnosis and treatment of 
brain-specific diseases and cancers 
Disclosure; Pages 31-35; 45pp; Japanese. 
The present sequence represents a novel human slit-like protein (the 
mature protein is' claimed in Claim 1). The slit-like polypeptide is 
useful for diagnosis and treatment of brain-specific diseases and 
cancers. Antibodies directed against the protein, or its fragments 
can also be used for diagnosing cancer. 
Sequence 1534 AA; 



Query Match 79.6%; Score 581; DB 30; Length 1534; 

Best Local Similarity 72.8%; Pred. No. 2.48e-40; 

Matches 75; Conservative 18; Mismatches 10; Indels 0; 



Gaps 



286 ctcsngivdcrgkgltaipanlpetmteirlelngiksippgafspyrklrridlsnnqi 345 
Mill Mlllllll MUM : Mill |:||:||:|||: 1 : 1 1 : 1 1 1 : 1 : 1 1 1 
3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

346 aeiapdafqglrslnslvlygnkitdlprgvfgglytlqllll 388 
:MIMMIIIM lllll||l||::::|:| || :|||||| 
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Oy 63 SDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 2 

ID R25079 standard; Protein; 1480 aa. 

AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine- rich repeat; Flank -LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss. 

OS Drosophila melanogaster . 

FH Key Location/Qualifiers 

FT peptide 1..36 

FT /label- signal 

FT domain 73., 294 

FT /label- Flank_LRR_Flank_l 

FT /note a "mediates adhesive events" 

FT domain 295.. 518 ' 

FT /label- Flank-LRR-FlankJ 

Jkk /note- "mediates adhesive events" 

■f domain 519.. 714 

W /label- Flank_LRRJlank_3 

FT /note- "mediates adhesive events" 

FT domain 715., 910 

FT /label- Flank_LRR_FlankJ 

FT /note- "mediates adhesive events" 

FT region 911,, 1150 

FT /label- Tandem_EGF_like_repeats 

FT /note- "involved in protein -protein interactions" 

FT region 1353,. 1393 

FT ' /label- 7th_EGF_like_repeat 

FT /note- "involved in receptor -ligand interactions" 

FT region 1394.. 1404 

FT /label- alternative_splice_segment 

FT /note- "developmentally regulated" 

FT region '14 05.. 14 80 

FT /label- C-terminal region 

PN WO9210518-A. 

PD 25-JUN-1992. 

PF 27-NOV-1991; (J09055. 

PR 07-DEC-1990; 0S-624135. 

PA (OTYA ) DNIV YALE. 

PI Artavanis-Tsakonas S, Rothberg JM; 

DR WPI; 92-234590/28. 

DR N-PSDB; Q25811. 

PT SLIT protein and sequence elements for treating 

PT neuro -degenerative disease - useful for Alzheimer's disease, 

f nerve damage and Parkinson's disease, for diagnosis of cancer 
Claim 1; Page 84-89; 122pp; English. 
The SLIT protein is necessary for normal development of the midline 
of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways , The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding, SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes-caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank -Leucine-rich 

CC region -Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 

CC claimed as are molecules comprising at least 1 FLank- LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 



Query Match 62.9%; Score 459; DB 5; Length 1480; 

Best Local Similarity 53.8%; Pred, No. 1.22e-29; 

Matches 56; Conservative 24; Mismatches 24; indels 0; Gaps C 

Db 298 pcrcadgivdcreksltsvpvtlpddttdvrleqnfitelppksfssfrrlrridlsnnn 357 

II h: Mill 1:1 :| II: ::||lll I :|: :|: :::|:|||:|:| 
Qy 2 PCTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQ 61 

Db 358 isriahdalsglkqlttlvlygnkikdlpsgvfkglgslrllll 401 

II II II: III Ihllllllll ::: hi II Ihllll 
Qy 62 ISDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 3 

ID W09405 standard; Protein; 345 AA. 

AC W09405; 

DT 17-SEP-1997 (first entry) 

DE Pineal gland specific gene-1 protein. 

KW PGSG-1; pineal gland; epiphysis cerebri; tumour; precocious puberty; 

KW hydrocephalus; papilledema; intracranial pressure; circadian rhythm; 

KW pituitary secretion; luteinising hormone; growth hormone; 

KW follicular stimulating hormone. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT peptide 1..21 

FT /note- "Putative leader sequence" 

FT protein 22.. 344 

FT' /label- Pineal_gland_specific_gene-1 

FT region 22.. 283 

FT /note- "Putative soluble portion of the protein" 

FT region 284.. 344 

FT /note- "Insoluble portion of the protein" 

FT region 283.. 344 

FT /note- "Putative transmembrane portion of the protein" 

PN W09639158-A1. 

PD 12-DEC-1996. 

PF 05-JUN-1995; U07067. 

PR 05-JUN-1995; WO-U07067. 

PA (HUMA-) HUMAN GENOME SCI INC. 

PI He WW, Rosen CA; 

DR WPI; 97-042840/04, 

DR N-PSDB; T47647. 

PT Pineal gland specific gene-1 and corresponding protein • used in the 

PT treatment of pineal tumours and alleviation of side effect, e.g. 

PT precocious puberty, hydrocephalus etc. 

PS Claim 1; Page 42-43; 56pp; English. 

CC The present sequence represents a novel isolated polypeptide, 

CC pineal gland specific gene-1 (PGSG-1), which was derived from a human 

CC pineal gland tissue cDNA library. The PGSG-1 polypeptide may be used 

CC to treat pineal tumours and thereby treat the side effects, including 

CC precocious puberty, hydrocephalus, papilledema and other signs of 

CC increased intracranial pressure. PGSG-1 and its protein may be used to 

CC regulate biological rhythms, in particular circadian rhythms, and to 

CC regulate pituitary secretion of hormones which regulate the onset of 

CC puberty, e.g. luteinising hormone, follicular stimulating hormone (FSH) 

CC and growth hormone (GH) released by the pituitary. The (ant)agonists 

CC which act against the protein may also be used to regulate the secretion 

CC of these hormones. The PGSG-1 gene and proteins may also be used to 

CC diagnose a mutation in the PGSG-1 gene and hence susceptibility to a 

CC disease mentioned above, 

SQ Sequence 345 AA; 

Query Match 32.1%; Score 234; DB 23; Length 345; 

Best Local Similarity 38.8%; Pred. No. 1.95e-10; 

Matches 40; Conservative 21; Mismatches 42; Indels 0; Gaps 0; 

Db 27 cqsstnfvdcsqqglaeipshlppqtrtlhlqdnqihhlpafafrsvpwlmtlnlsnnsl 86 

I I hill II 1 1 1 : : 1 1 ::h:| I HI II h :::hl : 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 87 snlapgafhglqhlqvlnltqnsllslesrlfhslpqlreldl 129 

. h:|l Ihll MM:: MM 



\ 
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Oy 63 SDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 4 

ID R08038 standard; protein; 692 AA. 

AC R08038; 

DT 26-FEB-1991 (first entry) 

DE Rat testicular luteinising hormone/choriogonadotropin receptor. 

KW LH/CG receptor; FSH receptor; tsh receptor; fertility; breast cancer; . 

KW prostate cancer; thyroid cancer; osteoporosis; Graves disease; 

KW polycistronic ovarian disease; vasomoter instability. 

OS Rattus rattus. 

PN WO9013643-A. 

PD 15-NOV-1990, 

PF 04-MAY-1990; U02488. 

PR 05-MAY-1989; US-347683. 

PA (GETH ) GENENTECH INC. 

A Nikolics K, McFarland KC, Segaloff DL, Seeburg PH; 

■ WPI; 90-361478/48. 

™ N-PSDB; Q06634 

PT Pharmaceutical compsn. contg. hormone receptor mol • used for 

PT treating fertility, breast-and prostate-cancer and osteoporosis, 

PT etc. 

PS Disclosure; fig 6; 7 8pp ; English. 

CC This rat testicular follicle-stimulating hormone (FSH) receptor. 

CC This receptor is useful in a pharmaceutical compsn. for treating 

CC e.g. breast-, prostate- and thyroid cancer, fertility, osteopor- 

CC osis, vasomoter instability and polycistronic ovarian disease. 

CC LH/CG- or TSH-receptors can also be used, to treat e.g. Graves 

CC disease. Abs can be used to inhibit receptor binding and for imag- 

CC ing and therapy. See also R08015-23, R08035-36 and Q06633. 

SQ Sequence 692 AA; 

Query Match 30.7%; Score 224; DB 2; Length 692; 

Best Local Similarity 32.4%; Pred. No. 1.30e-09; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 chcsnrvflcqdskvteiptdlprnaielrfvltklrvipkgsfagfgdlekieisqndv 82 • 

I III : I: : lll::|| :|:|: :: II |:|: : I :|:|| :: 
Qy 3 CTCSNNIVDCRGKGLMEI PANLPEG I VE I RLEQNS I KAIPAG AFTQYKKLKRIDI SK NQI 62 

Db 83 levieadvfsnlpklheiriekannvlyinpeafqnlpslrylli 127 ■ 

: I :| I I I : : :.|:: I |: I ||: ||: 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 

•ULT 5 

.' R87953 standard; Protein; 332 AA. 

AC R87953; 

DT 20-MAR-1996 (first entry) 

DE Bovine neurotrophic biglycan. 

KW Biglycan; proteoglycan; chondroitin sulphate; neuron protection; 

KW neurotrophic; central nervous system; CNS; memory loss; dementia; 

KW learning, 

OS Bos taurus, 

FH Key Location/Qualifiers 

FT region 7,. 23 

FT /label" Hypervariable region 

PN WO9530432-A1. 

PD 16-NOV-1995. 

PF 09-MAY-1994; E01479. 

PR 09-MAY-1994; WO-E01479. 

PA (BOEF ) BOEHRINGER MANNHEIM GMBH. 

PI Hasenoehrl r, Huston J, Junghans u, Kappler J, Koops A; 

PI Mueller HW; 

DR WPI; 95-403938/51. 

PT Proteoglycan cpds., partic. chondroitin sulphate proteoglycan (s) - 

PT for maintain structural and function of the CNS and attenuating 

PT memory deficit(s) in the elderly and patients with dementia 

PS Claim 3; Fig 8; 60pp; English. 

CC Bovine biglycan (R87953) is a chondroitin sulphate proteoglycan with 

CC neurotrophic activity for brain neurons. It can be used to enhance , 

CC the survival and maintain the structure and function of CNS neurons 



CC during normal ageing as well as after pathological and/or traumatic 

CC nervous system damage, It can also be used to restore function 

CC following nervous system lesions and degenerative diseases, and to 

CC improve learning efficiency and memory in the elderly and in patients 

CC with dementia. 

SQ Sequence 332 AA; 

Query Match ' 30.5%; Score 223; DB 15; Length 332; 

Best Local Similarity 38.6%; Pred. No. l,57e-09; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 114 lveippnlpsslvelrihdnrirkvpkgvfsglrnmneiemggnplensgfepgafdglk 173 

hllhlll ::||:|: :| |: :| I |: :::: |::: I : I : I Ml 
Qy 17 LMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI-SDIAPDAFQGLK 74 

Db 174 -lnylriseakltgipkdlpetlnelhl 200 

I I : 1:1 |:| | : I |:| 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 6 

ID R05159 standard; protein; 368 AA. 

AC R05159; 

DT 09-OCT-1990 (first entry) 

DE Sequence of human bone proteoglycan I (biglycan), 

KW Osteoporosis; rheumatoid arthritis; Paget 's disease; 

KW atherosclerosis; periodontal; human bone matrix; proteoglycan, 

OS Homo sapiens. 

PN DS7432044-A. 

PD 17-APR-1990. 

PF 3-NOV-1989; 432044, 

PR 3-NOV-1989; US-432044. 

PA (USSH) Nat Inst of Health. 

PI Termine J; 

DR WPI; 90-178641/23. 

DR N-PSDB; Q04490. 

PT Human bone matrix DNA and proteins ■ 

PT used in detection, diagnosis and treatment involving skeletal 

PT and/or connective tissue disease states . 

PS Disclosure; p; English, 

CC Probes and Abs raised to the proteins can be used to determine 

CC their levels useful in diagnosis of associated conective tissue 

CC diseases states such as osteoporosis, osteo/rheumatoid arthritis, 

CC Paget's disease, artherosclerosis and periodontal disease. 

CC Proteins may also be used to induce or block biological function. 

SQ Sequence 368 AA; 

Query Match 30.5%; Score 223; DB 1; Length 368; 

Best Local Similarity 37.5%; Pred. No. 1.57e-09; 

Matches 33; Conservative 26; Mismatches 26; Indels 3; Gaps 2; 

Db 150 lveippnlpsslvdvrihdnrirkvpkgvfsglrnmnciemggnplensgfepgafdglk 209 

1:111:111 ::|::|: :| |: :| I |: :::: |::: I : I : I ||:||| 
Qy 17 LMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI-SDIAPDAFQGLK 74 

Db 210 -lnylriseakltgipkdlpetlnelhl 236 

II: hi hi I : I hi 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 7 

ID R87951 standard; Protein; 369 AA. 

AC R87951; 

DT 20-MAR-1996 (first entry) 

DE Rat neurotrophic biglycan. 

KW Biglycan; proteoglycan; chondroitin sulphate; neuron protection; 

KW neurotrophic; central nervous system; CNS; memory loss; dementia; 

KW learning, 

OS Rattus sp, 

FH Key Location/Qualifiers 

FT peptide 1. .37 
FT /label- Sig_peptide 

ft region 44,, 60 
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FT /label" Hypervariable region 

PN WO9530432-A1. 

PD 16-NOV-1995. 

PF 09-MAY-1994; E01479. 

PR 09-MAY-1994; WO-E01479. 

PA (BOEF ) BOEHRINGER MANNHEIM GMBH. 

PI Hasenoehrl R, Huston J, Junghans (J, Kappler J, Koops A; 

Pi Mueller HW; 

DR WPI; 95-403938/51. 

DR N-PSDB; T08768 . 

PT Proteoglycan cpds . , partic. chondroitin sulphate proteoglycans ) - 

PT for maintain structural and function of the CNS and attenuating 

PT memory deficit(s) in the elderly and patients with dementia 

PS Claim 1; Page 44-45; 60pp; English. 

CC Rat biglycan (R87951) is a chondroitin sulphate proteoglycan with 

CC neurotrophic activity for brain neurons. Recombinant biglycan, 

CC obtd. by expression of encoding cDNA (T08768) in eukaryotic host 

CC cells, can be used to enhance the survival and maintain the structure 

CC and function of CNS neurons during normal ageing as well as after 

CC pathological and/or traumatic nervous system damage. It can also 

• be used to restore function following nervous system lesions and 
degenerative diseases, and to improve learning efficiency and memory 
in the elderly and in patients with dementia. 

SQ Sequence 369 AA; 

Query Match 30.3%; Score 221; DB 15; Length 369; 

Best Local Similarity 38.6*; Pred. No, 2,30e-09; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 lveippnlpsslvelrihdnrirkvpkgvfsglrnmnciemggnplensgfepgafdglk 210 
1:111:111 ::M:|: :| |: :| I I: :::: |::: I : I : I ||:||| ' 
Qy 17 LMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI--SDIAPDAFQGLK 74 

Db 211 -lnylriseakltgipkdlpetlnelhl 237 

II: 1:1 1:1 I : I |:| 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 8 

ID R87952 standard; Protein; 369 AA. 

AC R87952; 

DT 20-MAR-1996 (first entry) 

DE Human neurotrophic biglycan, 

KW Biglycan; proteoglycan; chondroitin sulphate; neuron protection; 

KW neurotrophic; central nervous system; CNS; memory loss; dementia; 

KW learning. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

• ; peptide 1..37 
/label- Sigj>eptide 
region 44.. 60 

FT /label- Hypervar iabl e.region 

PN WO9530432-A1. 

PD 16-NOV-1995 . 

PF 09-MAY-1994; E01479. 

PR 09-MAY-1994; WO-E01479. 

PA (BOEF ) BOEHRINGER MANNHEIM GMBH. 

PI Hasenoehrl R, Huston J, Junghans 0, Kappler J, Koops A; 

PI Mueller HW; 

DR WPI; 95-403938/51. 

PT Proteoglycan cpds., partic. chondroitin sulphate proteoglycan(s) - 

PT for maintain structural and function of the CNS and attenuating 

PT memory deficit(s) in the elderly and patients with dementia 

PS Claim 3; Fig 8; 60pp; English. 

CC Human biglycan (R87952) is a chondroitin sulphate proteoglycan with 

CC neurotrophic activity for brain neurons. It can be used to enhance 

CC the survival and maintain the structure and function of CNS neurons 

CC during normal ageing as well as after pathological and/or traumatic 

CC nervous system damage. It can also be used to restore function 

CC following nervous system lesions and degenerative diseases, and to 

CC improve learning efficiency and memory in the elderly and in patients 

CC with dementia. 

SQ Sequence 369 AA; 



Query Match 30.3%; Score 221; DB 15; Length 369; 

Best Local Similarity 38.64; Pred. No. 2,30e-09; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 lveippnlpsslvelrihdnrirkvpkgvfsglrnmnciemggnplensgfepgafdglk 210 

hllhlll ::||:h :| |: :| I h :::: |::: I : I : I ||:||| 
Qy 17 LMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI--SDIAPDAFQGLK 74 

Db 211 -lnylriseakltgipkdlpetlnelhl 237 

II: 1:1 1:1 I : I hi 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 9 

ID R30520 standard; protein; 634 AA. 

AC R30520; 

DT 10-MAY-1993 (first entry) 

DE N-terminal of LH receptor/FSH receptor chimaera #29. 

KW Follicle stimulating hormone receptor; luteinising hormone receptor; 

KW human chorionic gonadotropin ; glycoprotein hormone receptor; 

KW chimaera; chimera. 

OS Chimaeric; homo sapiens. 

PN W09222667-A. 

PD 23-DEC-1992. 

PF 12-JUN-1992; U04987 . 

PR 14-JDN-1991; US-715911. 

PA (UYNE-) ONIV NEW JERSEY. 

PI Bernard M, Moyle WR, Myers R; 

DR WPI; 93-018150/02. 

PT Glycoprotein hormone receptor analogues - having binding 

PT affinity to human chorionic gonadotrophin, luteinising and 

PT follicle stimulating hormones, useful in bio: immunoassays 

PS Examples; Fig 12; 103pp; English. 

CC This sequence represents the N-terminal of a novel protein having a 

CC binding affinity for human chorionic gonadotrophin (hCG), luteinising 

CC hormone (LH), and follicle stimulating hormone (FSH). The protein 

CC itself is a chimaera having residues from both thew FSH receptor, 

CC and LH receptor, The receptor analogues can be used in bioimmunoassays 

CC for the simultaneous detection of both LH (or hCG) and FSH as 

CC well as their ratio of biological activities. The analogues can also 

CC be used for raising, purifying and assaying antibodies to the 

CC analogues. Coding sequence for the chimaera was produced by two step 

CC PCR. 

SQ Sequence 634 AA; 

Query Match 30,3%; Score 221; DB 6; Length 634; 

Best Local Similarity 32.4%; Pred. No. 2.30e-09; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 chcsnrvflcqdskvteiptdlprnaielrfvltklrvipkgsfagfgdlekieisqndv 82 

I III : I: : lll::ll :|:|: :: II |:|: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 levieadvfsnlpklheiriekannllyinpeafqnlpslrylli 127 

: I :| I I I : : :|:: I \- I II: II: 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 10 

ID R30509 standard; protein; 689 AA, 

AC R30509; 

DT 10-MAY-1993 (first entry) 

DE N-terminal of LH receptor/FSH receptor chimaera #18. 

KW Follicle stimulating hormone receptor; luteinising hormone receptor; 

KW human chorionic gonadotrophin; glycoprotein hormone receptor; 

KW chimaera; chimera. 

OS Chimaeric; homo sapiens. 

PN W09222667-A. 

PD 23-DEC-1992. 

PF. 12-JDN-1992; 004987 . 

PR 14-J0N-1991; US-715911. 

PA (UYNE-) ONIV NEW JERSEY. 



/ 
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PI Bernard M, Moyle WR, Myers R; 

DR WPI; 93-018150/02. 

PT Glyco; protein hormone receptor analogues - having binding 

PT affinity to human chorionic gonadotropin, luteinising and 

PT follicle stimulating hormones, useful in bio: immunoassays 

PS Examples; Fig 12; 103pp; English. 

CC This sequence represents the N- terminal of a novel protein having a 

CC binding affinity for human chorionic gonadotropin (hCG), luteinising 

CC hormone (LH), and follicle stimulating hormone (FSH). The protein 

CC itself is a chimaera having residues from both thew FSH receptor, 

CC and LH receptor. The receptor, analogues can be used in bioimmunoassays 

CC for the simultaneous detection of both LH (or hCG) and FSH as 

CC well as their ratio of biological activities. The analogues can also 

CC be used for raising, purifying and assaying antibodies to the 

CC analogues. Coding sequence for the chimaera was produced by two step 

CC PCR, 

89 AA; 



I 



luery Match 30,3%; Score 221; DB 6; Length 689; 

lest Local Similarity 32.4%; Pred. No. 2. 30e-09; 
Matches 34; Conservative 27; Mismatches 42; Indels 2; 



Db 23 chcsnrvflcqdskvteiptdlprnaielrfvltklrvipkgsfagfgdlekieisqndv 82 

I III : I: : lll::|| :|:|: :: II |:|: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 levieadvfsnlpklheiriekannllyinpeafqnlpslrylli 127 

: I :| I I I = : :|:: I |: I lh ||: 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 11 

ID R30503 standard; protein; 692 AA. 

AC R30503; 

DT 10-MAY-1993 (first entry) 

DE N-terminal of LH receptor /FSH receptor chimaera 110, 

KW Follicle stimulating hormone receptor; luteinising hormone receptor; 

KW human chorionic gonadotropin; glycoprotein hormone receptor; 

KW chimaera; chimera. 

OS Chimaeric; homo sapiens. 

PN W09222667-A. 

PD 23-DEC-1992. 

PF 12-JUN-1992; U04987 . 

PR 14-JUN-1991; US-715911. 

PA (DYNE-) UNIV NEW JERSEY . 

•Bernard M, Moyle WR, Myers R; 
WPI; 93-018150/02. 
Glycoprotein hormone receptor analogues • having binding 

PT affinity to human chorionic gonadotropin, luteinising and 

PT follicle stimulating hormones, useful in bio: immunoassays 

PS Examples; Fig 12; 103pp; English. 

CC This sequence represents the N-terminal of a novel protein having a 

CC binding affinity for human chorionic gonadotrophin (hCG), luteinising 

CC hormone (LH), and follicle stimulating hormone (FSH), The protein 

CC itself is a chimaera having residues from both thew FSH receptor, 

CC and LH receptor. The receptor analogues can be used in bioimmunoassays 

CC for the simultaneous detection of both LH (or hCG) and FSH as 

CC well as their ratio of biological activities. The analogues can also 

CC be used for raising, purifying and assaying antibodies to the 

CC analogues. Coding sequence for the chimaera was produced by two step 

CC PCR. 

SQ Sequence 692 AA; 

. Query Match 30.3%; Score 221; DB 6; Length 692; 

Best Local Similarity 32,4%; Pred. No. 2.30e-09; 
Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 chcsnrvflcqdskvteiptdlprnaielrfvltklrvipkgsfagfgdlekieisqndv 82 

I III : I: : ll|::|| :: II |:|: : I :|:|| |:: 

Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 



Db 



83 levieadvfsnlpklheiriekannllyinpeafqnlpslrylli 127 
■: I :| I I I : : :|:: I h I lh II: 



Qy 63 SD- IAPDAFQGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 12 

ID R30506 standard; protein; 695 AA. 

AC R30506; 

DT 10-MAY-1993 (first entry) 

DE N-terminal of LH receptor/FSH receptor chimaera #15. 

KW Follicle stimulating hormone receptor; luteinising hormone receptor; 

KW human chorionic gonadotrophin; glycoprotein hormone receptor; 

KW chimaera; chimera. 

OS Chimaeric; homo sapiens. 

PN W09222667-A. 

PD 23-DEC-1992, 

PF 12-JUN-1992; U04987 . 

PR 14-JUN-1991; US-715911.- 

PA (UYNE-) UNIV NEW JERSEY. 

PI Bernard M, Moyle WR, Myers R; 

DR WPI; 93-018150/02. 

PT Glycoprotein hormone receptor analogues - having binding 

PT affinity to human chorionic gonadotrophin, luteinising and 

PT follicle stimulating hormones, useful in bio: immunoassays 

PS Examples; Fig 12; 103pp; English. 

CC This sequence represents the N-terminal of a novel protein having a 

CC binding affinity for human chorionic gonadotrophin (hCG), luteinising 

CC hormone (LH), and follicle. stimulating hormone (FSH). The protein 

CC itself is a chimaera having residues from both thew FSH receptor, 

CC and LH receptor. The receptor analogues can be used in bioimmunoassays 

CC for the simultaneous detection of both LH (or hCG) and FSH as 

CC well as their ratio of biological activities . The analogues can also 

CC be used for raising, purifying and assaying antibodies to the 

CC analogues. Coding sequence for the chimaera was produced by two step 

CC PCR. 

SQ Sequence 695 AA; 

Query Match 30.3%; Score 221; DB 6; Length 695; 

Best Local Similarity 32.4%; Pred. No. 2.30e-09; 

Matches , 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 chcsnrvflcqdskvteiptdlprnaielrfvltklrvipkgsfagfgdlekieisqndv 82 

I III : I: : lll::|| :|:|: :: II |:|: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 levieadvfsnlpklheiriekannllyinpeafqnlpslrylli 127 

: I :| I I I : : :|:: I |: I lh lh 
Qy 63 SD- IAPDAFQGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 13 

ID R30525 standard; protein; 695 AA. 

AC R30525; . 

DT 10-MAY-1993 (first entry) 

DE N-terminal of LH receptor/FSH receptor chimaera 134, 

KW Follicle stimulating hormone receptor; luteinising hormone receptor; 

KW human chorionic gonadotrophin; glycoprotein hormone receptor; 

KW chimaera; chimera, 

OS Chimaeric; homo sapiens, 

PN W09222667-A. 

PD 23-DEC-1992. 

PF 12-JUN-1992; U04987 . 

PR 14-JUN-1991; US-715911. 

PA (UYNE-) UNIV NEW JERSEY. 

PI Bernard M, Moyle WR, Myers R; 

DR WPI; 93-018150/02, 

PT Glycoprotein hormone receptor analogues ■ having binding 

PT affinity to human chorionic gonadotrophin, luteinising and 

PT follicle stimulating hormones, useful in bio: immunoassays 

PS Examples; Fig 12; 103pp; English. 

CC This sequence represents the N-terminal of a novel protein having a 

CC binding affinity for human chorionic gonadotrophin (hCG), luteinising 

CC hormone (LH), and follicle stimulating hormone (FSH). The protein 

CC itself is a chimaera having residues from both thew FSH receptor, 

CC and LH receptor. The receptor analogues can be used in bioimmunoassays 
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for the simultaneous detection of both LH (or hCG) and FSH as 
well as their ratio of biological activities. The analogues can also 
be used for raising, purifying and assaying antibodies to the 
analogues . Coding sequence for the chimaera was produced by two step 
PCR. 

695 AA; 



Query Match 30.3%; Score 221; DB 6; Length 695; 

Best Local Similarity 32.44; Pred. No. 2,30e-09; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 

Db 23 chcsnrvflcqdskvteiptdlprnaielrfvltklrvipkgsfagfgdlekieisqndv 82 

I III : I: : lll::|| :|:|: :: II |:|: : | :|:|| |:: 
Qy 3 CTC SNNIVDCRGKGLME I PANLPEG IVE I RLEQNSI KAI PAGAFTQYKKLKRI DI SKNQ I 62 

Db 83 levieadvfsnlpklheiriekannllyinpeafqnlpslrylli 127 

: I :| I I I : : :l" I |: I ||: ||: 
Qy 63 SD- IAPDAFQGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLLL 105 



f ; LT 14 
R30524 standard; protein; 695 AA. 
R30524; 

DT 10-MAY-1993 (first entry) 

DE N-terminal of LH receptor/FSH receptor chimaera 133. 

KW Follicle stimulating hormone receptor; luteinising hormone receptor; 

KW human chorionic gonadotrophin; glycoprotein hormone receptor; 

KW chimaera; chimera. 

OS Chimaeric; homo sapiens. 

PN W09222667-A. 

PD 23-DEC-1992. 

PF 12-JON-1992; U04987. 

PR 14-JUN-1991; DS-715911. 

PA (DYNE-) ONIV HEW JERSEY. 

PI Bernard M, Moyle WR, Myers R; 

DR WPI; 93-018150/02. 

PT Glyco: protein hormone receptor analogues - having binding 
PT affinity to human chorionic gonadotrophin, luteinising and 
PT follicle stimulating hormones, useful in bio: immunoassays 
PS Examples; Fig 12; 103pp; English, 

CC This sequence represents the N-terminal of a novel protein having a 
CC binding affinity for human chorionic gonadotrophin (hCG), luteinising 
CC hormone (LH), and follicle stimulating hormone (FSH). The protein 
CC itself is a chimaera having residues from both thew FSH receptor, 
CC and LH receptor. The receptor analogues can be used in bioimmunoassays 
CC for the simultaneous detection of both LH (or hCG) and FSH as 
CC well as their ratio of biological activities, The analogues can also 
CC be used for raising, purifying and assaying antibodies to the 

•analogues . Coding sequence for the chimaera was produced by two step 
PCR. 
Sequence 695 AA; 

Query Match 30.3%; Score 221; DB 6; Length 695; 

Best Local Similarity 32.4%; Pred. No, 2,30e-09; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 chcsnrvflcqdskvteiptdlprnaielrfvltklrvipkgsfagfgdlekieisqndv 82 

I III : I: : ll|::|| :|:|: :: II |;|: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLME I PANLPEGIVE I RLEQNS IKAI PAGAFTQYKKLKRID I SKNQ I 62 

Db 83 levieadvfsnlpklheiriekannllyinpeafqnlpslrylli 127 

: I :| I I I : : :|:: I |: I ||: ||: 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



Chimaeric; homo sapiens. 
W09222667-A. 
23-DEC-1992. 
12-J0N-1992; O04987. 
14-J0N-1991; DS-715911. 
(UYNE-} UNIV NEW JERSEY. 
Bernard M, Moyle WR, Myers R; 
WPI; 93-018150/02. 

Glyco :protein hormone receptor analogues - having binding 
affinity to human chorionic gonadotrophin, luteinising and 
follicle stimulating hormones, useful in bio: immunoassays 
Examples; Fig 12; 103pp; English. 

This sequence represents the N-terminal of a novel protein having a 
binding affinity for human chorionic gonadotrophin (hCG), luteinising 
hormone (LH), and follicle stimulating hormone (FSH). The protein 
itself is a chimaera having residues from both thew FSH receptor, 
and LH receptor. The receptor analogues can be used in bioimmunoassays 
for the simultaneous detection of both LH (or hCG) and FSH as 
well as their ratio of biological activities. The analogues can also, 
be used for raising, purifying and assaying antibodies to the 
analogues. Coding sequence for the chimaera was produced by two step 
PCR, 

696 AA; 



Query Match 30.3%; Score 221; DB 6; Length 696; 

Best Local Similarity 32,4%; Pred. No. 2.30e-09; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 

Db 23 chcsnrvflcqdskvteiptdlprnaielrfvltklrvipkgsfagfgdlekieisqndv 82 

I III = I: : ll|::|| :|:|: :: II |:|: : I :|:|| |:: 
2y 3 CTCSNNIVDCRGRGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 levieadvfsnlpklheiriekannllyinpeafqnlpslrylli 127 

: I :| I I I : : :|:: I h I l|: ||: 
Jy 63 SD- IAPDAFQGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLLL 105 



Search completed: Fri May 28 C 
Job time : 47 sees. 



RESULT 15 

ID R30523 standard; protein; 696 AA. 

AC R30523; 

DT 10-MAY-1993 (first entry) 

DE N-terminal of LH receptor/FSH receptor chimaera 132, 

KW Follicle stimulating hormone receptor; luteinising hormone receptor; 

KW human chorionic gonadotrophin; glycoprotein hormone receptor; 

KW chimaera; chimera. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^srchjpp protein - protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 08:42:30 1999; MasPar time 7,20 Seconds 

584,123 Million cell updates/sec 

Tabular output not generated. 

Title: MJS-09-191-647-3 

Description: (1-105) from DS09191647. pep 

Perfect Score: 730 

Sequence : 1 SPCTCSNNIVDCRGKGLMEI ITEIAKGLFDGLVSLQLLLL 105 



Scoring table: 



PAM 150 
Gap 11 



122810 seqs, 40068593 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 

Database: pir60 

l:pirl 2:pir2 3:pir3 4:pir4 

Statistics: Mean 42,417; Variance 97.098; scale 0.437. 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Query 



NO. 


Score 


Match Length D 


ID 


Description , 


Pred. No. 


1 


459 


62.9 


1469 


B36665 


slit protein 2 precur 


l,55e 


57 


2 


459 


62.9 


1480 


A36665 


slit protein 1 precur 


1.55e 


57 


3 


223 


30.5 


694 


JC4301 


follitropin receptor 


3.26e 


18 


4 


221 


30.3 


368 


BGHUN 


biglycan precursor - 


6.69e 


18 


5 


221 


30.3 


369 


S32793 


biglycan precursor ■ 


6.69e 


18 


6 


221 


30.3 


369 


S20811 


proteoglycan I - mous 


6.69e 


18 


7 


221 


30.3 


369 


S32559 


biglycan precursor - 


6.69e 


18 


8 


221 


30.3 


692 


A34548 


follitropin receptor 


6.69e 


18 


9 


221 


30.3 


694 


JC2237 


follitropin receptor, 


6.69e 


18 


10 


220 


30.1 


695 


145896 


follicle stimulating 


9.58e 


18 


11 


219 


30.0 


695 


JC1493 


follitropin receptor 


1.37e 


17 


12 


218 


29.9 


695 


JN0898 


follitropin receptor 


1.96e 


17 


13 


217 


29.7 


695 


QRBUFT 


follitropin receptor 


2,80e 


17 


14 


214 


29.3 


1091 


A58532 


glial cell membrane g 


8.16e 


17 


15 


209 


28.6 


313 


G02020 


p37NB - human 


4.81e 


16 


16 


204 


27.9 


357 


S24317 


decorin precursor - c 


2.81e 


15 


17 


200 


27.4 


605 


JC5239 


insulin-like growth f 


1.15e 


14 


18 


200 


27.4 


662 


S42799 


garp precursor - huma 


1.15e 


14 


19 


198 


27.1 


605 


A41915 


insulin-like growth f 


2.31e 


14 


20 


197 


27,0 


354 


S29145 


decorin precursor - r 


3.27e 


14 


21 


193 


26,4 


360 


S06280 


decorin precursor - b 


1.32e 


13 


22 


191 


26.2 


360 


147020 


decorin ■ rabbit 


2.64e 


13 


23 


190 


26.0 


560 


A60164 


platelet membrane gly 


3.73e 


13 



24 


188 


25,8 


359 


NBHUC8 


decorin precursor - h 


7.44e 


13 


25 


187 


25,6 


603 


JC1282 


insulin-like growth f 


1.05e 


12 


26 


186 


25,5 


354 


A55454 


decorin precursor - m 


1.48e 


12 


27 


184, 


25.2 


322 


S72271 


proteoglycan Lb precu 


2.95e 


12 


28 


184 


25.2 


361 


A53860 


chondroadherin precur 


2.95e 


12 


29 


182 


24.9 


603 


JC6128 


insulin-like growth f 


5,85e 


12 


30 


180 


24,7 


316 


A41781 


proteoglycan -Lb - chi 


1.16e 


11 


31 


179 


24.5 


382 


139068 


proline- arginine-ric 


1.63e 


11 


32 


175 


24,0 


1115 


S40241 


G protein -coupled rec 


6,33e 


11 


33 


173 


23.7 


907 


JE0176 


orphan G protein -coup 


1.24e 


10 


34 


170 


23.3 


536 


A34901 


lysine carboxypeptida 


3.41e 


10 


35 


170 


23.3 


1134 


A29944 


chaoptin precursor - 


3.41e 


10 


36 


155 


21.2 


682 


A49121 


cell -surface molecule 


4 .87e 


08 


37 


155 


21.2 


682 


A43318 


connectin precursor - 


4.87e 


08 


38 


153 


21,0 


1535 


S46224 


peroxidasin - fruit f 


9.34e 


08 


39 


152 


20.8 


626 


NBHUIA 


platelet glycoprotein 


1.29e 


07 


40 


148 


20.3 


343 


A41748 


lumican precursor - c 


4.69e 


07 


41 


148 


20.3 


375 


S05390 


fibromodulin precurso 


4.69e 


07 


42 


148 


20.3 


440 


A39613 


oligodendrocyte-myeli 


4.69e 


07 


43 


148 


20,3 


440 


A47530 


oligodendrocyte-myeli 


4.69e 


07 


44 


145 


19.9 


1025 


A57676 


protein kinase Xa21 ( 


l,22e 


06 


45 


142 


19.5 


176 


A46606 


platelet glycoprotein 


3.17e 


06 



RESULT 
ENTRY 
TITLE 



1 



ORGANISM 
DATE 



B36665 ttype complete 
slit protein 2 precursor - fruit fly (Drosophila 

melanogaster) 
fforjaljiame Drosophila melanogaster 
30-Apr-1991 *sequence_revision 30-Apr-1991 itext change 

16-Dec-1998 
B36665 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4 : 2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross-references MUID : 91099665 
^accession B36665 

ttstatus preliminary 
ttmoleculejype mm 
tfresidues 1-1469 ttlabel ROT 
ttcross -references GB:X53959 



tauthors 



♦journal 
♦title 



tgene FlyBase:sli 

t f cross -references FlyBase : FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl- terminal homology 



FEATURE 
66-91 



197-220 



tdomain proteoglycan amino-terminal homology tlabel 
PAH1\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR1\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR2\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
tdomain proteoglycan carboxyl -terminal homology tlabel 

PCS1\ 

tdomain proteoglycan amino-terminal homology tlabel 
PAH2\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 
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homology tlabel LRR7\ 


371 


394 


ttdoiti&in l6ucin8*rich 3lph<i - 2'(jlycoprot6iD rspsst 
homology tlabel LRR8\ 


395 


418 


tdomain leucine-rich alpha * 2 "glycoprotein repeat 
homology It label LRR9\ 


419 


442 


tdomain leucine-rich alpha-2 "glycoprotein repeat 
homology tlabel LR10\ 


450 


494 


tdomain proteoglycan carboxyl "terminal homology tlabel 
PCS2\ 


512 


537 


ftdoniciin protsocflyccin smino'tsrinincil homology ftlsbcl 
PAH3\ 


547 


571 


tdomain leucine-rich alpha - 2 "glycoprotein repeat 
homology tlabel LR11\ 


572 


595 


idomsin lGucin6~rich dlph&*2*(jlycoprot6in rsp6cit 
homology tlabel LR12\ 


596 


619 


idoni3in l6ucin6*rich slphd"2 - cflycoprot6in rcpsdt 
homology tlabel LR13\ 


620 


643 


tdomdin lfiucinS'rich dlph3"2*^lycoprotsin rspocit 
homology tlabel LR14\ 


651 


695 


idnmain nTfitpnnl vran pafhfwvl •frfSY'm'Inal hnmfllfvtv filahol 
iruuuiaiu piuLCuyiyLau uaiiAMyi Lci.nu.iiai iiuiuuiuyy fflaucl 

PCS3\ 


708 


733 


tdomain proteoglycan amino-terminal homology tlabel 
PAH4\ 


743 


766 


tdomain leucine-rich alpha -2 -glycoprotein repeat 
homology tlabel LR15\ 


767 


790 


tdomain leucine-rich alpha -2 -glycoprotein repeat 
homology tlabel LR16\ 


846 


890 


tdomain proteoglycan carboxyl -terminal homology tlabel 
PCS4\ 


1028-1061 


tdomain EGF homology ttlabel egf 



tlength 1469 tmolecular-weight 164695 ichecksum 8361 



Query Match 62.94; Score 459; DB 2; Length 1469; 

Best Local Similarity 53.8%; Pred. No. 1.55e-57; 

Matches 56; Conservative 24; Mismatches 24; Indels C 



Gaps 0; 



Db 298 PCRCADGIVDCREKSLTSVPVTLPDDTTDVRLEQNFITELPPKSFSSFRRLRRIDLSNNN 357 

II h: Mill hi H lh ::||||| I :|: :|: :::|:|||:|:| 
Qy 2 PCTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQKSIKAIPAGAFTQYKKLKRIDISKNQ 61 

Db 358 ISRIAHDALSGLKQLTTLVLYGNKIKDLPSGVFKGLGSLRLLLL 401 

II II II: III Ihllllllll ::: |:| II Ihllll 
Qy 62 ISDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 
ENTRY 
TITLE 



2 



ACCESSIONS 



tauthors 



tjournal 
ttitle 



A3 6665 ttype complete 
slit protein 1 precursor - fruit fly (Drosophila 

•melanogaster) 
ANISM tformaljiame Drosophila melanogaster 
E 30-Apr-1991 tsequence_revision 30-Apr-1991 ttext change 

24-Sep-1998 
A36665; S13523 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S, 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross-references MUID: 91099665 
taccession A36665 

it status preliminary 
ftmolecule.type mRNA 
itresidues 1-1480 ttlabel ROT 
ttcross-references GB:X53959; NID:g8614; PID:g8615 
GENETICS 

tgene FlyBase:sli 

ttcross-references FlyBase: FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoproteirt repeat 
homology; proteoglycan carboxyl -terminal homology 
KEYWORDS alternative splicing 



FEATURE 
66-91 



228-272 

288-313 

323-346 

347-370 

371-394 

395-418 

419-442 

450-494 

512-537 

547-571 

572-595 

596-619 

620-643 

651-695 

708-733 

743-766 

767-790 

791-814 

815-838 

846-890 

1028-1061 
SUMMARY 



tdomain proteoglycan amino-terminal homology tlabel 
PAH1\ 

tdomain leucine-rich alpha - 2 - glycoprote in repeat 

homology tlabel LRR1\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR2\ 
tdomain leucine-rich alpha - 2 - glycoprotein repeat 

homology tlabel LRR3\ 
tdomain leucine-rich alpha - 2 - glycoprotein repeat 

homology tlabel LRR4\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
tdomain proteoglycan carboxyl -terminal homology tlabel 

PCS1\ 

tdomain proteoglycan amino-terminal homology tlabel 
PAH2\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR7\ 
tdomain leucine-rich alpha - 2 - glycoprotein repeat 

homology tlabel LRR8\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR9\ 
tdomain leucine-rich alpha- 2 - g ly coprotein repeat 

homology tlabel LR10\ 
tdomain proteoglycan carboxyl -terminal homology tlabel 

PCS2\ 

tdomain proteoglycan amino-terminal homology tlabel 

PAH3\ ' 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR11\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR12\ 
tdomain leucine-rich alpha - 2 - glycoprote in repeat 

homology tlabel LR13\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR14\ 
tdomain proteoglycan carboxyl -terminal homology tlabel 

PCS3\ 

tdomain proteoglycan amino-terminal homology tlabel 
PAH4\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR15\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR16\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR17\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR18\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS4\ 

tdomain EGF homology tlabel EGF 
tlength 1480 tmolecular-weight 165751 tchecksum 900 



Query Match 62.9%; Score 459; DB 2; Length 1480; 

Best Local Similarity 53,8%; Pred. No. 1.55e-57; 

Matches 56; Conservative 24; Mismatches 24; Indels 0; Gaps ( 

Db 298 PCRCADGIVDCREKSLTSVPVTLPDDTTDVRLEQNFITELPPKSFSSFRRLRRIDLSNNN 357 

II I:: Mill hi :| l|: ::||||| I :|: :|: :::|:|||:|:| 
Qy 2 PCTCSNNIVDCRGRGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQ 61 

Db 358 ISRIAHDALSGLKQLTTLVLYGNKIKDLPSGVFKGLGSLRLLLL 401 

II II II: III Ihlllllil! ::: |:| II Ihllll 
Qy 62 ISDIAPDAFQGLKSLTSLVLYGNRITEIAKGLFDGLVSLQLLLL 105 



RESULT 3 

entry JC4301 ttype complete 

TITLE follitropin receptor - pig 

ALTERNATEJAMES follicle-stimulating hormone receptor 
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ORGANISM tformaljiame Sus scrofa domestica tcommon_name domestic pig 

DATE 16-Nov-1995 tsequence.revision 08-Feb-1996 ftext change 

17-Mar-1999 
ACCESSIONS JC4301 
REFERENCE JC4301 

fauthors Remy, J.J.; Lahbib-Mansais, Y.; Yerle, M.; Bozon, v.; 

Couture, L,; Pajot, E. ; Grebert, D.; Salesse, R. 
fjournal Gene (1995) 163:257-261 

f title The porcine follitropin receptor: cDNA cloning, functional 

expressionand chromosomal localization of the gene, 
tcross-references MUID:96011644 
faccession JC4301 

#tmolecule_type mRNA 

tfresidues 1-694 ##label REM 

tfcross-references GB:L31966 

ftexperimental_source ovarian granulosa cells 

fur This receptor belongs to the family of the G-protein coupled 
receptors. It has the functional roles of high affinity for 
follicle-stimulating hormone. It also plays an essential role in 
spermatogenesis in male and oogenesis in female, 

GENETICS 

fgene fshr 
tmap_position 3 q2.2-q2.3 
CLASSIFICATION fsuperfamily glycoprotein hormone receptor; leucine-rich 

alpha - 2 -glycoprotein repeat homology 
KEYWORDS G protein -coupled receptor; hormone receptor; transmembrane 

protein 

FEATURE 

1-365 fdomain follicle-stimulating hormone binding fstatus 

predicted f label H0B\ 
366-388 tdomain transmembrane fstatus predicted tlabel TMl\ 

398-420 tdomain transmembrane fstatus predicted flabel TM2\ 

443-464 fdomain transmembrane fstatus predicted tlabel TM3\ 

485-507 tdomain transmembrane tstatus predicted tlabel TM4\ 

528-549 fdomain transmembrane tstatus predicted tlabel TM5\ 

573-596 fdomain transmembrane tstatus predicted tlabel TM6\ 

608-629 fdomain transmembrane fstatus predicted tlabel TM7 

SUMMARY tlength 694 tmolecular -weight 78278 fchecksum 409 

Query Match 30.5%; Score 223; DB 2; Length 694; 

Best Local Similarity 33,3%; Pred. No. 3.26e-18; 

Matches 35; Conservative 27; Mismatches 41; Indels 2; Gaps 2; 

22 CHCSNGVFLCQESKVTEIPPDLPRNAVELRFVLTKLRAIPKGAFSGFGDLEKIEISQNDV 81 

I III : I: : lll::|l 1 1 : 1 : ::||| III: : I :|:|| |:: 
3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 



I 



82 LEVIEANVFSNLPKLHEIRIEKANNLLYIDPDAFQNLPNLRYLLI 126 
: I :: I I I : : :|:: I |: I :|: ||: 
Oy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 4 

ENTRY BGHUN ftype complete 

TITLE biglycan precursor - human 

ALTERNATE JAMES cartilage proteoglycan protein I; dermatan sulfate 

proteoglycan I (DS-PGI); proteoglycan I core protein (PG-I) 
ORGANISM tformaljiame Homo sapiens tcommonjiame man 

DATE 21-Apr-1992 tsequence revision 26-May-1995 ttext.change 

14-N0V-1997 

ACCESSIONS A40757; A32458; S05639; A28457; S14349 
REFERENCE A40757 

tauthors Fisher, L.w,; Heegaard, A.M.; Vetter, U.; Vogel, w.; Just, 

W.; Termine, J.D.; Young, M.F. 
fjournal J. Biol, Chem, (1991) 266:14371-14377 
ttitle Human biglycan gene. Putative promoter, intron-exon 

junctions, and chromosomal localization, 
tcross-references MUID: 91317791 
taccession A40757 
ftmolecule.type DNA 
ftresidues 1-368 ttlabel FIS 
tfcross-references GB:M65151; GB:M65152; GB:M65153 
REFERENCE A32458 



fauthors Fisher, L.W.; Termine, J.D.; Young, M.F. 

fjournal J, Biol. Chem. (1989) 264:4571-4576 

ttitle Deduced protein sequence of bone small proteoglycan I 

(Biglycan) shows homology with proteoglycan II (Decorin) 
and several nonconnective tissue proteins in a variety of 



tcross-references MUID: 89174714 
taccession A32458 
tfmolecule.type mRNA 

tfresidues 1-138, 'NV', 141-162, 'DV, 165-368 ttlabel FI2 
tfcross-references GB: J04599 

ttnote parts of this sequence, including the amino end of the 

mature protein, were determined by protein sequencing 

REFERENCE S05639 

fauthors Roughley, P.J.; White, R.J, 
fjournal Biochem. J. (1989) 262:823-827 

ttitle Dermatan sulphate proteoglycans of human articular cartilage, 
The properties of dermatan sulphate proteoglycans I and II. 
tcross-references MOID: 90073579 
taccession S05639 
tfmolecule_type protein 

tfresidues 38-41, 'X', 43-46, 'X', 48-57 ttlabel ROD- 
REFERENCE A92656 

fauthors Fisher, L.W.; Hawkins, G.R.; Tuross, N. ; Termine, J.D, 

fjournal J. Biol. Chem. (1987) 262:9702-9708 

ttitle Purification and partial characterization of small 

proteoglycans I and II, bone sialoproteins I and II, and 
osteonectin from the mineral compartment of developing 
human bone, 
tcross-references MUID : 87250639 
taccession A28457 
tfmolecule.type protein 

tfresidues 38-41, 'X', 43-62, 'X', 64-66 ttlabel FI3 
ttexperimental.source bone 
REFERENCE S14349 

fauthors Stoecker, G.; Meyer, H , E . ; Wagener, C; Greiling, H, 
fjournal Biochem. J. (1991) 274:415-420 
ttitle Purification and N-terminal amino acid sequence of a 
chondroitin sulphate/dermatan sulphate proteoglycan 
isolated from intima/media preparations of human aorta, 
tcross-references MUID: 91174749 
taccession S14349 
tfmolecule.type protein 
ftresidues 38-57 ttlabel STO 
ttexperimental_source aorta 
GENETICS 

tgene GDB : BGN 

tfcross-references GDB:119727; OMIM:301870 
tmap_position Xq28-Xq28 

tintrons 80/1; 117/3; 189/1; 226/1; 257/2; 303/3 
CLASSIFICATION fsuperfamily decorin; leucine-rich alpha-2-glycoprotein 

repeat homology; proteoglycan amino-terminal homology; 

proteoglycan carboxyl -terminal homology 
KEYWORDS chondroitin sulfate proteoglycan; dermatan sulfate; 

duplication; extracellular matrix; glycoprotein; tandem 

repeat 

FEATURE 

1-16 tdomain signal sequence tstatus predicted tlabel SIG\ 

17-37 tdomain propeptide tstatus predicted tlabel PRO\ 

38-368 tproduct biglycan fstatus predicted tlabel MAT\ 

57-81 tdomain proteoglycan amino-terminal homology tlabel PAH\ 

91-114 fdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LRR1\ 
115-138 tdomain leucine-rich alpha - 2 - g ly copr ote in repeat 

homology tlabel LRR2\ 
' 139-159 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR3\ 
160-183 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LRR4\ 
184-207 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR5\ 
209-229 tdomain leucine-rich alpha - 2 - g lycoprote in repeat 

homology tlabel LRR6\ 
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230 ~ 253 


tdomain leucine-rich alpha-2-glycoprotein repeat 




nuiuuiuyy iriauci lak/\ 


254-277 


tdomain lsucine-rich alpha-2-glycoprotein repeat 




nomoiogy KiaDei lkko\ 


278-300 


tdoroain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR9\ 


301-315 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology fstatus atypical tlabel LR10\ 


316-368 


tdomain proteoglycan carboxyl -terminal homology tlabel 




PCH\ 


42,47 


tbinding_site dermatan sulfate (Ser) (covalent) tstatus 




experimental 


180,198 


tbinding_site dermatan sulfate (Ser) (covalent) tstatus 




predicted\ 


270,311 


tbinding_site carbohydrate (Asn) (covalent) tstatus 




predicted 



SUMMARY tlength 368 tmolecular -weight 41654 tchecksum 1684 

Query Match 30.3*; Score 221; DB 1; Length 368; 

Best Local Similarity 38.6%; Pred. No. 6.69e-18; 

«tches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 
150 LVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLK 209 
■ MINI! ::||:|: :| |: :| I |: :::: |::: | : | : | 1 1 : 1 ! I 
Qy 17 LME I P ANLPEG I VE IRLEQNS I KAI P AG AFTQYKKLKRIDI S KNQ I - - SDI APDAFQG LK 74 

Db 210 * LNYLRISEAKLTG IPKDLPETLNELHL 236 

II: 1:1 hi I : I hi 
Oy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 5 

ENTRY S32793 ttype complete 

TITLE biglycan precursor - rat 

ALTERNATEJAMES dermatan sulfate proteoglycan I (DS-PGI); proteoglycan I core 
protein (PG-I) 

ORGANISM t formal jiame Rattus norvegicus fcommon_name Norway rat 

DATE 02-Dec-1993 tsequence.revision 01-Sep-1995 ttext.change 

■ 29-Jan-1999 
ACCESSIONS S32793 
REFERENCE S32793 

tauthors Dreher, K.L.; Asundi, v.; Matzura, D.; Cowan, K. 

tjournal Eur. J. Cell Biol. (1990) 53:296-304 

ttitle Vascular smooth muscle biglycan represents a highly conserved 

proteoglycan within the arterial wall . 
tcross-references MUID: 91184222 
taccession S32793 

t tstatus preliminary 

tttmolecule.type mRNA 
ttresidues 1-369 ttlabel DRE 
t tcross-references GB:U17834; NID:g600497; PID:g600498 
SIFICATION tsuperfamily decorin; leucine-rich alpha-2-glycoprotein 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl -terminal homology . 
KEYWORDS chondroitin sulfate proteoglycan; dermatan sulfate; 
extracellular matrix; glycoprotein 

FEATURE 

1-16 tdomain signal sequence tstatus predicted tlabel SIG\ 

17-37 tdomain propeptide tstatus predicted tlabel PRO\ 

38-369 tproduct biglycan tstatus predicted flabel MAT\ 

58-82 _ tdomain proteoglycan amino-terminal homology tlabel PAH\ 

92-115' tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LRR1\ 
116-139 tdomain leucine-rich alpha - 2 - g lycoprotein repeat 

homology tlabel LRR2\ 
140-160 tdomain leucine-rich alpha - 2 - g lycoprotein repeat 

homology tlabel LRR3\ 
161-184 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR4\ 
185-208 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
210-230 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR6\ 



231-254 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR7\ 
'255-278 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR8\ 
279-301 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR9\ 
302-316 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tstatus atypical tlabel LR10\ 
317-369 tdomain proteoglycan carboxyl -terminal homology tlabel 

PCH\ 

42,48,181,199 tbinding.site dermatan sulfate (Ser) (covalent) tstatus 



271,312 tbinding_site carbohydrate (Asn) (covalent) tstatus 

predicted 

SUMMARY tlength 369 tmolecular-weight 41706 tchecksum 3056 

Query Match 30.3%; Score 221; DB 2; Length 369; 

Best Local Similarity 38.6%; Pred, No, 6.69e-18; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 LVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLK 210 

|:|||:MI ::!!:!: :| |: :| I h ::: | : | : | |:l| 

Qy 17 LMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI-SDIAPDAFQGLK 74 

Db 211 -LNYLRISEAKLTGIPKDLPETLNELHL 237 

II: 1:1 hi i : : 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 6 

entry S20811 ttype complete 

TITLE proteoglycan I - mouse 

ALTERNATEJAMES biglycan 

ORGANISM tformaljiame Mus musculus tcommon_name house mouse 

DATE 20-Feb-1995 tsequencejrevision 20-Feb-1995 ttext.change 

05-Dec-1997 
ACCESSIONS S20811; A57645; 149534 
REFERENCE S20811 

tauthors Naitoh, Y.; Suzuki, S. 
tsubmission submitted to the EMBL Data Library, July 1990 
tdescription Nucleotide sequences of cDNAs encoding mouse PGI and pgii. 
taccession S20811 

t'tstatus preliminary 
ttmolecule.type mRNA 
ttresidues 1-369 ttlabel NAI 
ttcross -references EMBL:X53928; NID:g53666; PID:g53667 
REFERENCE A57645 

tauthors Wegrowski, Y,; Pillarisetti, J,; Danielson, K,G, ; Suzuki, S,; 

lozzo, R.V. 
tjournal Genomics (1995) 30:8-17 

ttitle The murine biglycan: complete cDNA cloning, genomic 
organization, promoter function, and expression, 
tcross-references MUID:96129295 
taccession A57645 

♦♦status preliminary 

ttmolecule_type mRNA 

ttresidues 1-67, 'W\ 69-369 ttlabel WEG 
ttcross -references GB:L20276; NID:g348961; PID:g348962 
ttnote authors translated the codon TGG for residue 58 as Cys 

REFERENCE 149534 

tauthors Rau, W,; Just, W.; Vetter, o.; Vogel, W. 

tjournal Mamm. Genome (1994) 5:395-396 

ttitle A dinucleotide repeat in the mouse biglycan gene (EST) on the 

X chromosome, 
tcross-references MUID: 94319093 
taccession 149534 

♦tstatus preliminary; translated from GB/EMBL/DDBJ 

ttmolecule_type mRNA 

ttresidues 1-67, 'W, 69-369 ttlabel RES 
ttcross -references GB:L20276; NID;g348961; PID:g348962 
GENETICS 

tgene Bgn 

CLASSIFICATION tsuperfamily decorin; leucine-rich alpha-2-glycoprotein 
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repeat homology; proteoglycan amino-terminal homology; 



KEYWORDS 


chondroitin sulfate proteoglycan; dermatan sulfate; 




extracellular matrix; glycoprotein 


FEATURE 




58-82 


tdomain proteoglycan amino -terminal homology Mabel PAH\ 


92-115 


tdomain leucine-rich alpha - 2 - g lycoprote in repeat 




homology tlabel LRR1\ 


116-139 


tdoraain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR2\ 


140-160 


tdomain leucine-rich alpha - 2 - g lycoprote in repeat 




homology tlabel LRR3\ 


161-184 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR4\ 


185-208 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


210-230 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR6\ 


0231-254 


tdomain leucine-rich alpha - 2 - g lycoprote in repeat 




homology tlabel LRR7\ 


255-278 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR8\ 


279-301 


tdomain leucine-rich alpha • 2 - g 1 y coprote i n repeat 




homology tlabel LRR9\ 


302-316 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tstatus atypical tlabel LR10\ 


317-369 


tdomain proteoglycan carboxyl -terminal homology tlabel 



SUMMARY t length 369 tmolecular -weight 41639 tchecksum 3586 

Query Match 30.3%; Score 221; DB 2; Length 369; 

Best Local Similarity 38.6*; Pred. No. 6 . 69e-18; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 LVE I PPNLPSS LVELRI HDNRIRKVPKGVFSGLRNMNC IEMGGNPLENSGFEPGAFDGLK 210 

hllhlll :| |: :| I |: |;:: I : I : I Ml 

Qy 17 LMEIPANLPEG IVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI--SDIAPDAFQGLK 74 

Db 211 -LNYLRISEAKLTGIPKDLPETLNELHL 237 

II: 1:1 1:1 I : I hi 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 7 

1NTRY S32559 ttype complete 

^■LE biglycan precursor - bovine 

^ERNATEJAMES dermatan sulfate proteoglycan I (DS-PGI); proteochondroitin 
core protein; proteoglycan I core protein (PG-I) 
ORGANISM tformal_name Bos primigenius taurus tcommon name cattle 

DATE 03-May-1994 Isequencejevision 20-Feb-1995 ttext change 

14-Nov-1997 

ACCESSIONS S32559; S34229; A33701; A31430; PT0078; S55673; A33137 
REFERENCE S32559 

tauthors Torok, M.A.; Evans, S.A.S.; Marcum, J. A, 
♦journal Biochim. Biophys, Acta (1993) 1173:81-84 . 
♦title cDNA sequence for bovine biglycan' (PGI) protein core, 
taccession S32559 

ttmolecule_type mRNA 

ttresidues 1-369 ttlabel TOR 

ttcross-references EMBL:L07953; NID:gl62746 

ttexperimental.source aortic smooth muscle 
REFERENCE S34229 

tauthors Marcum, J, A.; Torok, M.; Evans, S. 

tsubmission submitted to the EMBL Data Library, December 1992 

taccession S34229 

ttmolecule.type mRNA 

ttresidues 1-250, 'V, 252-369 ttlabel MAR 
ttcross-references EMBL:L07953 
REFERENCE A33701 

tauthors Neame, P.J.; Choi, H.U.; Rosenberg, L.C. 
tjournal J. Biol, Chem. (1989) 264:8653-8661 
ttitle The primary structure of the core protein of the small, 
leucine-rich proteoglycan (PG I) from bovine articular 



cartilage, 
tcross-references MUID: 89255324 
taccession A33701 
ttmolecule_type protein 

ttresidues 38-187, 'E', 189-367, 'Y' ttlabel NEA 
ttexperimental_source cartilage 
REFERENCE ■ A31430 

tauthors Choi, H,U.; Johnson, T.L.; Pal, S.; Tang, L.H.; 

L.; Neame, P.J. 
tjournal J. Biol. Chem. (1989) 264:2876-2884 
ttitle Characterization of the dermatan sulfate proteoglycans, 
DS-PGI and DS-PGII, from bovine articular cartilage and 
skin isolated by octyl-sepharose chromatography, 
tcross-references MUID:89123388 
taccession A3 14 30 
ttmolecule.type protein 

ttresidues 38-41, 'X', 43-47, 'X', 49-63 ttlabel CHO 
ttnote sequences from skin and cartilage were identical 

REFERENCE PT0077 

tauthors Marcum, J. A,; Thompson, M.A. 

tjournal Biochem. Biophys. Res. Commun. (1991) 175:706-712 

ttitle The amino-terminal region of a proteochondroitin core 

protein, secreted by aortic smooth muscle cells, shares 
sequence homology with the pre-propeptide region of the 
biglycan core protein from human bone, 
tcross-references MUID : 91207372 
taccession PT0078 
ttmolecule_type protein 
ttresidues 17-24, 'F' ,26-30 ttlabel MA2 
ttexperimental_source aortic smooth muscle 
REFERENCE S55673 

tauthors Scott, P.G.; Nakano, T. ; Dodd, CM. 
■ tjournal Biochim. Biophys. Acta (1995) 1244:121-128 
ttitle Small proteoglycans from different regions of the 
fibrocartilaginous temporomandibular joint disc, 
tcross-references MUID: 95284 073 
taccession S55673 
ttmolecule.type protein 

ttresidues 38-41, 'X', 43-47, 'X' ,49-53 ttlabel SCO 
CLASSIFICATION tsuperfamily decorin; leucine-rich alpha-2-glycoprotein 

repeat homology; proteoglycan amino-terminal homology; 

proteoglycan carboxyl -terminal homology 
KEYWORDS . ' cartilage; chondroitin sulfate proteoglycan; dermatan 

sulfate; extracellular matrix; glycoprotein 

FEATURE 

1-16 tdomain signal sequence tstatus predicted tlabel SIG\ 

17-37 tdomain amino-terminal propeptide tstatus predicted 

tlabel PRO\ 

38-369 tproduct biglycan tstatus predicted tlabel mat\ 

58-82 tdomain proteoglycan amino-terminal homology tlabel PAH\ 

92-115 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRRl\ 
116-139 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR2\ 
140-160 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR3\ 
161-184 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
185-208 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
210-230 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR6\ 
231-254 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR7\ 
255-278 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR8\ 
279-301 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LRR9\ 
302-316 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tstatus atypical tlabel LR10\ 
317-369 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCH\ 

42,48 tbinding.site dermatan sulfate (Ser) (covalent) tstatus 
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experimental 

181,199 tbinding_site dermatan sulfate (Ser) (covalent) tstatus 

predicted 

271,312 tbinding_site carbohydrate (Asn) (covalent) tstatus 

predicted 

SUMMARY tlength 369 tmolecular-weight 41590 tchecksum 1525 

Query Match 30.3%; Score 221; DB 2; Length 369; 

Best Local Similarity 38.6*; Pred. No. 6.69e-18; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 LVEIPPNLPSSLVELRIHDNRIRKVPRGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLR 210 

1:111:111 ::||:|: :| |: :| I |: :::: |::: I : I : I ||:||| 
Qy 17 LMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISRNQI-SDIAPDAFQGLK 74 

Db 211 -LNYLRISEARLTGIPKDLPETLNELHL 237 

II: 1:1 1:1 I : I |:| 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



JjSULT 

ORGANISM 
DATE 



tauthors 



♦journal 
ititle 



8 

A34548 ttype complete 
follitropin receptor precursor • rat 
tformaljiame Rattus norvegicus #coraraon_name Norway rat 
22-Jan-1993 Sequence revision 22-Jan-1993 ttext.change 
29-Jan-1999 
IS A34548; A41729 
A34548 

Sprengel, R.; Braun, T.; Nikolics, K, ; Segaloff, D.L.; 

Seeburg, P.H. 
Mol. Endocrinol. (1990) 4:525-530 
The testicular receptor for follicle stimulating hormone: 
structure and functional expression of cloned cDNA. 
tcross -references MUID: 91125358 
iaccession A34548 
#trnolecule_type mRNA 
^residues 1-692 ttlabel SPR 
ficross-references GB:L02842; NID:g204183; PID:g204184 
REFERENCE A41729 

♦authors Heckert, L.L.; Daley, I.J.; Griswold, M.D. 

♦journal Mol. Endocrinol. (1992) 6:70-80 

♦title Structural organization of the, follicle-stimulating hormone 

receptor gene, 
♦cross-references MUID: 92149579 
♦accession A41729 

♦♦status preliminary 

♦tmolecule_type DNA 

♦♦residues 1-692 ♦♦label HEC 

•♦♦cross-references GB:S81198; NID:g245344; PID:g245345 
♦♦note sequence inconsistent with the nucleotide translation 

♦♦note sequence extracted from NCBI backbone (NCBIN: 81117, 

NCBIN: 81119, NCBIN:81121, NCBIN:81171, NCBIN:81174, 
NCBIN:81178, NCBIN:81183, NCBIN:81185, NCBIN:81194, 
NCBIN;81198, NCBIP:81116) 

REFERENCE A57562 

♦authors Davis, D.; Liu, X.; Segaloff, D.L. 
♦journal Mol, Endocrinol. (1995) 9:159-170 
♦title Identification of the sites of N-linked glycosylation on the 
follicle-stimulating hormone (FSH) receptor and assessment 
of their role in FSH receptor function, 
♦contents annotation; glycosylation sites 
FUNCTION 

♦description receptor that mediates the biochemical effects of follitropin 
CLASSIFICATION ♦superfamily glycoprotein hormone receptor; leucine-rich 

alpha-2-glycoprotein repeat homology 
KEYWORDS alternative splicing; G protein -coupled receptor; 

glycoprotein; hormone receptor; phosphoprotein; 
transmembrane protein 



FEATURE 
1-15 
16-692 

16-366 



♦domain signal sequence tstatus predicted ♦label SIG\ 
♦product follitropin receptor tstatus predicted tlabel 
MAT\ 

♦domain extracellular hormone binding ♦status predicted 



56-70 

71-95 

96-120 

121-145 

146-169 

172-193 

194-218 

367-387 
398-421 
443-465 
486-508 
529-550 
574-597 
609-630 
191,199,293 

554 

595 

SUMMARY 



♦label EHB\ 

♦domain leucine-rich alpha - 2 -glycoprote in repeat 

homology ♦label LRRl\ 
♦domain leucine-rich a Ipha - 2 -glycoprotein repeat 

homology ♦label LRR2\ 
♦domain leucine-rich alpha-2-glycoprotein repeat 

homology ♦label LRR3\ 
♦domain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LRR4\ 
♦domain leucine-rich a lpha - 2 -glycoprotein repeat 

homology tlabel LRR5\ 
♦domain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
♦domain leucine-rich alpha-2-glycoprotein repeat 

homology ♦label LRR7\ 
♦domain transmembrane ♦status predicted ♦ label TM1\ 
♦domain transmembrane status predicted ♦label TM2\ 
♦domain transmembrane ♦status predicted tlabel TM3\ 
♦domain transmembrane ♦status predicted ♦ label TM4\ 
♦domain transmembrane tstatus predicted ♦label TM5\ 
♦domain transmembrane ♦status predicted tlabel TM6\ 
♦domain transmembrane tstatus predicted ♦label TM7\ 
♦bindings ite carbohydrate (Asn) (covalent) tstatus 

predicted\ 

♦binding.site phosphate (Thr) (covalent) (by protein 

kinase C) tstatus predicted 
■ tbinding_site phosphate (Ser) (covalent) (by protein 
kinase C) tstatus predicted 
♦length 692 tmolecular-weight 77680 tchecksum 8898 



Query Match 30.3*; 
Best Local Similarity 32.4%; 
Matches 34; Conservative 



Score 221; DB 2; Length 692; 

Pred. No. 6.69e-18; 

27; Mismatches 42; Indels 2; 



Db 23 CHCSNRVFLCQDSKVTEIPTDLPRNAIELRFVLTKLRVIPKGSFAGFGDLEKIEISQNDV 82 

I III : I: : lll-ll :|:|: :: II |:|: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIRAIPAGAFTQYKKLKRIDISRNQI 62 

Db 83 LEVIEADVFSNLPKLHEIRIEKANNLLYINPEAFQNLPSLRYLLI 127 

: I :| I I I : : :|:: I |: I II: II: 
Qy 63 SD - I APDAFQGLKSLTSLVLY -GNK ITE IAKGLFDGLVSLQLLLL 105 



RESULT 9 

ENTRY 

TITLE 

ALTERNATEJAMES 
ORGANISM 
DATE 



JC2237 ttype complete 
follitropin receptor, testis - horse 



ACCESSIONS 



♦authors 



♦journal 
♦title 



t formal jiame Equus caballus tcommonjiame domestic horse 
28-Aug-1985 tsequence.revision 07-Oct-1994 ttext change 

29-Jan-1999 
JC2237; JC2370 
JC2237 

Robert, P.; Amsellem, s.; Christophe, S.; Benifla, J.L.; 

Bellet, D.; Roman, A.; Bidart, J.M. 
Biochem. Biophys . Res. Commun. (1994) 201:201-207 
Cloning and sequencing of the equine testicular follitropin 
receptor, 
♦cross-references MUID: 94256980 
taccession JC2237 
ttmolecule.type mRNA 
ttresidues 1-694 ttlabel ROB 
ttcross-references GB:S70150; NID;g546896; PID:g546897 
»texperimental_source testis 
CLASSIFICATION tsuperfamily glycoprotein hormone receptor; leucine-rich 
alpha-2 -glycoprotein repeat homology 
glycoprotein; hormone receptor; transmembrane protein 



KEYWORDS 
FEATURE 
56-70 

71-95 



tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR1\ 
tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LRR2\ 
♦domain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR3\ 
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121-145 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR4\ 
146-169 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology flabel LRR5\ 
172-193 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
194-218 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR7\ 
366-386 tdomain transmembrane tstatus predicted tlabel TM1\ 

398-420 tdomain transmembrane tstatus predicted tlabel TM2\ 

443-464 tdomain transmembrane tstatus predicted tlabel TM3\ 

485-507 tdomain transmembrane tstatus predicted tlabel TM4\ 

528-549 tdomain transmembrane tstatus predicted tlabel TM5\ 

573-596 tdomain transmembrane tstatus predicted tlabel TM6\ 

608-629 tdomain transmembrane tstatus predicted tlabel TM7\ 

191,199,268,293 tbinding.site carbohydrate (Asn) (covalent) tstatus 



^fflARY 



tlength 694 tmolecular-weight 78004 tchecksum 8235 



Query Match 30.3*; Score 221; DB 2; Length 694; 

Best Local Similarity 32,44; Pred. No. 6.69e-18; 
Matches 34; Conservative 27; Mismatches 42; mdels 2; Gaps 2; 

Db 23 CHCSNRVFLCQESKVTEIPSDLPRNALELRFVLTRLRVIPRGAFSGFGDLEKIEISQNDV 82 

I III : I: : Ml::! :|:|: II III: : I :|:|| |:: 

Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIRAIPAGAFTQYKRLRRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDHDAFQNLPNLQYLLI 127 

: I :: I I I : : :|:: I |: I :|| ||: 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 10 

ENTRY 145896 ttype complete 

TITLE follicle stimulating hormone receptor - bovine 

ORGANISM tformal.name Bos primigenius taurus tcommonjiame cattle 

DATE 15-Oct-1996 t sequence jrevis ion 15-Oct-1996 ttext change 

06-Jun-1997 
ACCESSIONS 145896 
REFERENCE 145896 

iauthors Houde, A.; Lambert, A.; Saumande, J.; Silversides, D.W.; 
Lussier, J.G. 

tjournal Mol. Reprod. Dev. (1994) 39:127-135 

ttitle Structure of the bovine follicle-stimulating hormone receptor 

•complementary DNA and expression in bovine tissues, 
tcross -references MUID: 95127199 
taccession 145896 
ttstatus preliminary; translated from GB/EMBL/DDBJ 
ttmolecule.type mRNA 
ttresidues 1-695 ttlabel HOC 
ttcross-references GB:L22319; NID:g404671; PID:g404672 
GENETICS 

tgene fshr 

CLASSIFICATION tsuperfamily glycoprotein hormone receptor; leucine-rich 

alpha-2-glycoprotein repeat homology 
SUMMARY tlength 695 tmolecular-weight 78084 tchecksum 895 

Query Match 30.1*; Score 220; DB 2; Length 695; 

Best Local Similarity 32.4*; Pred. No. 9.58e-18; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNGVFLCQESRVTEIPSDLPRDAVELRFVLTRLRVIPRGAFSGFGDLEKIEISQNDV 82 

I III : h : lll:;|l l|:|: :: II llh : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKRLRRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDPDAFQNLPNLRYLLI 127 

: I :: I I I : : :|:: I |: I :|: ||: 
Qy 63 SD-IAPDAFQGLRSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 
ENTRY 
TITLE 



JC1493 ttype complete 
follitropin receptor - sheep 



ALTERNATEJAMES follicle stimulating hormone receptor 

ORGANISM t formal jiame Ovis orientalis aries, Ovis amnion aries 

tcommonjiame domestic sheep 
DATE 03-Feb-1994 tsequence revision 03-Feb-1994 ttext change 

17-Mar-1999 
JC1493; 147080 
JC1493 

tauthors Khan, H.; Yarney, T.A.; Sairam, M.R, 

tjournal Biochem. Biophys, Res. Commun, (1993) 190:888-894 

ttitle Cloning of alternately spliced mRNA transcripts coding for 

variants of ovine testicular follitropin receptor lacking 

the G protein coupling domains, 
tcross -references MUID:93176195 

taccession JC1493 ' 

ttmoleculejype mRNA 

ttresidues 1-695 ttlabel KHA 

ttexperimental_source testis 
REFERENCE 147080 

tauthors Yarney, T.A.; Sairam, M.R,; Khan, H.; Ravindranath, N.; 

Payne, S.; Seidah, N.G. 
tjournal Mol. Cell. Endocrinol. (1993) 93:219-226 
ttitle Molecular cloning and expression of the ovine testicular 

follicle stimulating hormone receptor, 
tcross-references MUID: 93351750 
taccession 147080 

ttstatus preliminary; translated from GB/EMBL/DDBJ 

tfmolecule.type mRNA 

ttresidues 1-695 ttlabel YAR 

ttcross-references GB:L07302; NID:gl65884; PID:gl65885 

GENETICS 

tgene FSH-R 
CLASSIFICATION tsuperfamily glycoprotein hormone receptor; leucine-rich 

alpha-2-glycoprotein repeat homology 
KEYWORDS G protein-coupled receptor; glycoprotein; transmembrane 

protein 

FEATURE 

191,199 tbinding.site carbohydrate (Asn) (covalent) tstatus 

predicted 

SUMMARY tlength 695 tmolecular-weight 78237 tchecksum 1112 

Query Match 30.01; Score 219; DB 2; Length 695; 

Best Local Similarity 31.41; Pred. No. 1.37e-17; 

Matches 33; Conservative 28; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNGVFLCQDSKVTEMPSDLPRDAVELRFVLTKLRVIPEGAFSGFGDLEKIEISQNDV 82 

I III : I: : hh:ll l|:|: :: II III: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGRGLMEIPANIPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDPDAFQNLPNLRYLLI 127 

: I I I I : = :|:: I h I :|: lh 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 12 

ENTRY JN0898 ttype complete 

TITLE follitropin receptor precursor - crab-eating macaque 

ALTERNATEJAMES follicle-stimulating hormone receptor (FSHR) 
ORGANISM tformal.name Macaca fascicularis tcommonjiame crab-eating 
macaque 

DATE 19-May-1994 isequence_revision 19-May-1994 ttext change 

13-Nov-1998 
ACCESSIONS JN0898; S36452 



tauthors Gromoll, J.; Dankbar, B.; Sharma, R.S.; Nieschlag, E. 

tjournal Biochem. Biophys, Res. Commun. (1993) 196:1066-1072 

ttitle Molecular cloning of the testicular follicle stimulating 
hormone receptor of the non human primate Macaca 
fascicularis and identification of multiple transcripts in 
the testis. 

tcross-references MUID: 94071854 

taccession JN0898 
ttmolecule_type mRNA 
ttresidues 1-695 ttlabel GRO 
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ttcross-references EMBL:X74454; NID:g396801; PID:g396802 
ttnote the authors translated the codon AGT for residue 488 as 

Arg 

FUNCTION 

•description receptor that mediates the biochemical effects of follitropin 
CLASSIFICATION tsuperfamily glycoprotein hormone receptor; leucine-rich 

alpha-2-glycoprotein repeat homology 
KEYWORDS G protein-coupled receptor; glycoprotein; hormone receptor; 
phosphoprotein; pituitary; transmembrane protein 



FEATURE 

1-17 
18-695 



121 


145 


146 


169 


^172 


193 


194 


218 


367 


387 


399 


421 


444 


465 


486 


508 


529 


550 


574 


597 


609 


630 


191 


199 


555 




596 




SUMMARY 



•domain signal sequence tstatus predicted tlabel SIG\ 
tproduct follitropin receptor istatus predicted tlabel 
PFH\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR2\ 
tdomain leucine-rich a lpha - 2 -glycoprotein repeat 

homology tlabel LRR3\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR5\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
tdomain leucine-rich a lpha - 2 -glycoprotein repeat 

homology tlabel LRR7\ 
tdomain transmembrane tstatus predicted tlabel TM1\ 
tdomain transmembrane tstatus predicted tlabel TM2\ 
tdomain transmembrane tstatus predicted tlabel TM3\ 
tdomain transmembrane tstatus predicted tlabel TM4\ 
tdomain transmembrane tstatus predicted tlabel TM5\ 
tdomain transmembrane tstatus predicted tlabel TM6\ 
tdomain transmembrane tstatus predicted tlabel TM7\ 
il8 tbinding_site carbohydrate (Asn) (covalent) tstatus 
predicted\ 

tbinding_site phosphate (Thr) (covalent) (by protein 

kinase C) tstatus predicted\ 
tbinding_site phosphate (Ser) (covalent) (by protein 
kinase C) tstatus predicted 
tlength 695 tmolecular-weight 78343 tchecksum 1011 

Query Match 29.9%; Score 218; DB 2; Length 695; 

Best Local Similarity 31.4%; Pred. No. 1.96e-17; 

Matches 33; Conservative 28; Mismatches 42; Indels 2; Gaps 2; 

)b 23 CHCSNRVFLCQESKVTEIPSDLPRNAIELRFVHTKLRVIQRGAFSGFGDLERIEISQNDV 82 

I III : I: : lll::|| :|:|: : :: I III: : I :|:|| |:: 
2y 3 CTCSNNIVDCRGKGIiMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 



i 



83 LEVIEADVFSNLPKLHEIRIEKANNLLYINPEAFQNLPNLRYLLI 127 

: I :| I I I : : :|:: I |: I :|: II; 
63 SD - IAPDAFQGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 13 

ENTRY QRHUFT I type complete 

TITLE follitropin receptor precursor - human 

ALTERNATEJAMES follicle stimulating hormone receptor (FSHR) 

CONTAINS follitropin receptor precursor long splice form; follitropin 

receptor precursor short splice form 
ORGANISM tformal_name Homo sapiens tcommonjiame man 

DATE 30-Sep-1991 isequence_revision 06-Sep-1996 ttext change 

05-Sep-1997 

ACCESSIONS 157661; 156448; PC1147; S30560; 157672; JN0122 
REFERENCE 157661 

tauthors Gromoll, J.; Dankbar, B.; Gudermann, T. 
♦journal Mol. Cell. Endocrinol. (1994) 102:93-102 
•title Characterization of the 5' flanking region of the human 

follicle-stimulating hormone receptor gene, 
tcross -references MUID: 95011044 
taccession 157661 

ftstatus translated from GB/EMBL/DDBJ 

ttmolecule.type DNA 

^residues 1-51 Mabel GRO 



••cross-references GB:S73199; NID:g685036; PID:g685037 
REFERENCE 156448 

tauthors Gromoll, J,; Ried, T.; Holtgreve-Grez, H.; Nieschlag, E.; 
Gudermann, T. 

•journal J. Mol. Endocrinol. (1994) 12:265-271 

.♦title Localization of the human FSH receptor to chromosome 2 p21 

using a genomic probe comprising exon 10. 
f cross-references MOID: 95000244 
taccession 156448 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
ttmolecule.type DNA 

ttresidues 286-695 ttlabel GR2 

ttcross-references GB:S73526; NID:g688069; PID:g688070 
REFERENCE PC1147 

tauthors Gromoll, J.; Gudermann, T.; Nieschlag, E. 
•journal Biochem. Biophys. Res. Commun. (1992) 188:1077-1083 
•title Molecular cloning of a truncated isoform of the human 

follicle stimulating. hormone receptor, 
•cross-references MUID: 9307 5197 
taccession PC1147 

ttstatus nucleic acid sequence not shown 

ttmolecule.type mRNA 

ttresidues 1-223, 286-294, 'P' , 296-342 ttlabel GR3 
ttcross-references EMBL:X68044; NID:g31473; PID:g31474 
ttexperimental.source testis 
REFERENCE S30560 
tauthors Gromoll, J. 

•submission submitted to the EMBL Data Library, August 1992 
taccession S30560 
ttmolecule.type mRNA 

ttresidues 1-12, 'R', 14-223, 286-294, 'P', 296-342 ttlabel GR4 
ttcross-references EMBL:X68044; NID:g31473; PID:g31474 
REFERENCE 157672 
. tauthors Kelton, C.A.; Cheng, S.V.; Nugent, N.P.; Schweickhardt, R.L.; 

Rosenthal, J.L.; Overton, S.A.; Wands, G.D.; Kuzeja, J.B.; 
Luchette, C.A.; Chappel, S.C. 
•journal Mol. Cell. Endocrinol. (1992) 89:141-151 
•title The cloning of the human follicle stimulating hormone 

receptor and its expression in COS-7, CHO, and Y-l cells, 
•cross-references MUID: 93246012 
•accession 157672 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
ttmolecule.type mRNA 

ttresidues 1-679, 'N' ,681-695 ttlabel KEL 
ttcross-references GB:S59900; NID:g300072; PID:g300073 
REFERENCE JN0122 

•authors Minegish, T.; Nakamura, K.; Takakura, Y, ; Ibuki, Y,; 
Igarashi, M. 

•journal Biochem, Biophys. Res. Commun, (1991) 175:1125-1130 
•title Cloning and sequencing of human FSH receptor cDNA. 
•cross-references MUID: 91222171 
•accession JN0122 
•tmolecule type mRNA 

ttresidues 1-111, T ,113-196, 'AV ,199-306, 'A' ,308-695 ttlabel MIN 
ttcross-references EMBL:M65085; NID:gl82770; PID:gl82771 
GENETICS 

tgene GDB : FSHR 

ttcross-references GDB:127510; OMIM:136435 

*map_position 2p21-2pl6 

ftintrons 223/3 

•note the exact position of the intron cannot be determined from 

the experimental data 

FUNCTION 

•description receptor that mediates the biochemical effects of follitropin 
CLASSIFICATION tsuperfamily glycoprotein hormone receptor; leucine-rich 

alpha-2-glycoprotein repeat homology 
KEYWORDS alternative splicing; G protein-coupled receptor; 

glycoprotein; hormone receptor; phosphoprotein; 

transmembrane protein 

FEATURE 

1-695 tproduct follitropin receptor precursor, long splice 

form tstatus predicted tlabel SPLl\ 
1-223,286-695 tproduct follitropin receptor precursor, short splice ' 
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form fstatus predicted ilabel SPL2\ 
1-15 f domain signal sequence fstatus predicted flabel SIG\ 

16-695 fproduct follitropin receptor fstatus predicted flabel 

MAT\ 

16-366 f domain extracellular hormone binding tstatus predicted 

flabel EHB\ 

56-70 idomain leucine-rich alpha -2 -glycoprotein repeat 

homology flabel LRRl\ 
71-95 fdomain leucine-rich alpha -2 -glycoprotein repeat 

homology flabel LRR2\ 
96-120 fdomain leucine-rich al pha - 2 - glycoprotein repeat 

homology flabel LRR3\ 
121-145 fdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR4\ 
146-169 fdomain leucine-rich al pha - 2 - glycoprotein repeat 

homology flabel LRR5\ 

•172-193 fdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR6\ 
194-218 fdomain leucine-rich al pha - 2 - glycoprotein repeat 

homology flabel LRR7\ 
367-387 fdomain transmembrane fstatus predicted flabel TM1\ 

398-421 fdomain transmembrane fstatus predicted flabel IM2\ 

444-465 fdomain transmembrane fstatus predicted flabel TM3\ 

486-508 fdomain transmembrane fstatus predicted flabel TM4\ 

529-550 fdomain transmembrane fstatus predicted flabel IM5\ 

574-597 fdomain transmembrane fstatus predicted flabel TM6\ 

609-630 fdomain transmembrane fstatus predicted flabel TM7\ 

191,199,293,318 fbinding_site carbohydrate (Asn) (covalent) fstatus 
predicted\ 

555 fbinding_site phosphate (Thr) (covalent) (by protein 

kinase C) fstatus predicted\ 
596 fbinding_site phosphate (Ser) (covalent) (by protein 

kinase C) tstatus predicted 
SUMMARY tlength 695 fmolecular-weight 78267 tchecksum 2066 

Query Match 29.7%; Score 217; DB 1; Length 695; 

Best local Similarity 32,4%; Pred. No. 2,80e-17; 

Matches 34; Conservative 26; Mismatches 43; indels 2; Gaps 2; 

Db 23 CHCSNRVFLCQESKVTEIPSDLPRNAIELRFVLTKLRVIQKGAFSGFGDLEKIEISQNDV 82 

I III : I: : ll|::|| :|:|: :: I III: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEADVFSNLPKLHEIRIEKANNLLYINPEAFQNLPNLQYLLI 127 
A : I :| ! I I : : :|:: I |: I :|| ||: 

63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 14 

ENTRY A58532 I type complete 

TITLE ■ glial cell membrane glycoprotein LIG-1 precursor - mouse 

ORGANISM tformaljiame Mus musculus fcommon.name house mouse 

DATE ll-Apr-1997 tsequence_revision U-Apr-1997 ftext change 

17-Mar-1999 
ACCESSIONS A58532 
REFERENCE A58532 

fauthors Suzuki, Y.; Sato, N,; Tohyama, M.; Wanaka, A.; Takagi, T. 

fjournal J. Biol. Chem. (1996) 271:22522-22527 

ftitle cDNA cloning of a novel membrane glycoprotein that is 

expressed specifically in glial cells in the mouse brain; 
LIG-1, a protein with leucine-rich repeats and 
immunoglobulin-like domains, 
f cross-references MUID: 96394313 
faccession A58532 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
ttmolecule.type mRNA 
tfresidues 1-1091 fflabel SUZ 
ffcross-references GB:D78572; NID:gl545806; PID:gl545807 
CLASSIFICATION fsuperfamily leucine-rich alpha -2 -glycoprotein repeat 
homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl- terminal homology 



FEATURE 
36-61 



fdomain proteoglycan amino-terminal homology flabel PAH\ 



71-94 
95-117 
118-141 
142-165 
166-189 
191-213 
214-237 
238-261 
262-285 
286-309 
310-333 
334-357 
358-381 
385-408 
409-432 
440-485 
SUMMARY 



fdomain leucine-rich alpha - 2 - g lycoprotein repeat 

homology flabel LRRl\ 
fdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR2\ 
fdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology flabel LRR3\ 
fdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology flabel LRR4\ 
fdomain leucine-rich alpha- 2 - g lycoprotein repeat 

homology flabel LRR5\ 
fdomain leucine-rich alpha - 2 -g lycoprotein repeat 

homology flabel LRR6\ 
fdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR7\ 
fdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR8\ 
fdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR9\ 
fdomain leucine-rich alpha- 2-glycoprotein repeat 

homology flabel LR10\ 
fdomain leucine-rich alpha - 2 - g lycoprotein repeat 

homology flabel LRll\ 
fdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology flabel LR12\ 
fdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology flabel LR13\ 
fdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR14\ 
fdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR15\ 
fdomain proteoglycan carboxyl -terminal homology flabel 

PCS1 

1091 fmolecular-weight 119283 fchecksum 7937 



Query Match 29.3*; 
Best Local Similarity 42.5%; 
Matches 37; Conservative 



Score 214; DB 2; Length 1091; 

Pred. No. 8.16e-17; 

19; Mismatches 27; Indels 4; 



Db 330 AELS-SLSILRLSHNAISHIAEGAFKGLKSLRVLDLDHNEISGTIEDTSGAFTGLDNLSK 388 

hi: " =11 :hl h III I I: :|: |:|| I :: II II :|: 
Qy 22 ANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQISD- 1 - - APDAFQGLKSLTS 78 

Db 389 LTLFGNKIKSVAKRAFSGLESLEHLNL 415 

I hill :M I II II: I I 
Qy. 79 LVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 15 



ENTRY 
TITLE 
ORGANISM 
DATE 



G02020 



ACCESSIONS 



ftype complete 
p37NB - human 

tformaljiame Homo sapiens tcommonjiame man 
21-Dec-1996 fsequence_revision 06-Jun-1997 ftext change 

16-Dec-1998 
G02020 
H00740 

Kim, D.; LaQuaglia, M.; Yang, S.Y, 
submitted to the EMBL Data Library, August 1995 
G02020 

preliminary; translated from GB/EMBL/DDBJ 
tfmolecule.type mRNA 
ifresidues 1-313 fflabel RIM 
tf cross -references EMBL:U32907; NID:gl236328; PID:gl236329 
CLASSIFICATION fsuperfamily proteoglycan carboxyl -terminal homology 
FEATURE 

161-207 fdomain proteoglycan carboxyl -terminal homology flabel 

PCS1 

SUMMARY ilength 313 fmolecular-weight 36287 tchecksum 5976 



tauthors 
tsubmission 
faccession 
ttstatus 



Query Match 28.6%; 
Best Local Similarity 32.9%; 
Matches 28; Conservative 



Score 209; DB 2; Length 313; 

Pred. No. 4.81e-16; 

27; Mismatches 30; Indels ( 



Gaps 0; 



Db 65 LDCQERKLVYVLPGWPQDLLHMLLARNKIRTLKNNMFSKFKKLKSLDLQQNEISKIESEA 124 
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************************************************************ **************** 

Release 3,1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

|erch_pp protein - protein database search, using Smith-Waterman algorithm 



:43:11 1999; MasPar time 5.02 Seconds 

591,610 Million cell updates/sec 



Tun on: Fri May 28 08 

Tabular output not generated. 



Title: MJS-09-191-647-3 

Description: (1-105) from OS09191647. pep 

Perfect Score: 730 

Sequence : 1 SPCTCSNNIVDCRGKGLMEI ITE I AKG LFDGLVSLQLLLL 105 

Scoring table: PAM 150 
Gap 11 

Searched: 77977 seqs, 28268293 residues 

Post-processing: Minimum Match 0* 

Listing first 45 summaries 

Database: swiss-prot37 
hswissprot 

Statistics: Mean 43.647; Variance 83.058; scale 0.525 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



SUMMARIES 



NO, 


Score 


Match Length D 


3 ID 


Description 


Pred. NO. 


1 


459 


62.9 


1480 


L SLIT_DROME 


SLIT PROTEIN PRECURSOR 


2.99e 


69 


2 


221 


30,3 


368 


PGS1JUMAN 


BONE/CARTILAGE PROTEOG 


7.55e 


22 


3 


221 


30.3 


369 


PGS1J10USE 


BONE/CARTILAGE PROTEOG 


7,55e 


22 


4 


221 


30.3 


369 


PGSLRAT 


BONE/CARTILAGE PROTEOG 


7.55e 


22 


5 


221 


30,3 


369 


PGS1_CANFA 


BONE/CARTILAGE PROTEOG 


7,55e 


22 


6 


221 


30.3 


692 


FSHRJAT 


FOLLICLE STIMULATING H 


7.55e 


22 


7 


221 


30.3 


694 


L FSHRJORSE 


FOLLICLE STIMULATING H 


7.55e 


22 


8 


220 


30.1 


695 


FSHRJOVIN 


FOLLICLE STIMULATING H 


l,16e 


21 


9 


219 


30.0 


695 


L FSHR_SHEEP 


FOLLICLE STIMULATING H 


1.77e 


21 


10 


219 


30.0 


695 


I FSHR_PIG 


FOLLICLE STIMULATING H 


l,77e 


21 


11 


218 


29.9 


695 


FSHRJACFA 


FOLLICLE STIMULATING H 


2,72e 


21 


12 


217 


29.7 


695 


FSHRJUMAN 


FOLLICLE STIMULATING H 


4,16e 


21 


13 


215 


29.5 


687 


L FSHRJQOAS 


FOLLICLE STIMULATING H 


9,72e 


21 


14 


211 


28.9 


369 


PGS1.BQVIN 


BONE/CARTILAGE PROTEOG 


5,29e 


20 


15 


204 


27.9 


357 


L PGS2_CHICK 


BONE PROTEOGLYCAN II P 


l.Ole 


18 


16 


200 


27.4 


605 


ALSJAPPA 


INSULIN-LIKE GROWTH FA 


5.37e 


18 


17 


200 


27.4 


662 


L GARPJUMAN 


GARP PROTEIN PRECURSOR 


5.37e 


18 


18 


198 


27.1 


605 


ALSJUMAN 


INSULIN-LIKE GROWTH FA 


1.24e 


17 


19 


197 


27.0 


354 


PGS2_RAT 


BONE PROTEOGLYCAN II P 


1.87e 


17 


20 


193 


26.4 


360 


PGS2J0VIN 


BONE PROTEOGLYCAN II P 


9.82e 


17 


21 


192 


26.3 


693 


FSHR_CHICK 


FOLLICLE STIMULATING H 


1.48e 


16 


22 


190 


26,0 


560 


GPVJUMAN 


PLATELET GLYCOPROTEIN 


3.38e 


16 


23 


189 


25.9 


360 


PGS2_CANFA 


BONE PROTEOGLYCAN II P 


S.lOe 


16 



24 


189 


25.9 


567 


1 


GPV RAT 


PLATELET GLYCOPROTEIN 


5 . 10e-16 


25 


188 


25.8 


359 


1 


PGS2 HUMAN 


BONE PROTEOGLYCAN II P 


7.70e-16 


26 


187 


25.6 


603 


1 


ALS_RAT 


INSULIN-LIKE GROWTH FA 


L.16e-15 


27 


186 


25.5 


354 


1 


PGS2J10USE 


BONE PROTEOGLYCAN II P 


L,75e-15 


28 


184 


25,2 


361 


1 


CHADJ30VIN 


CHONDROADHERIN PRECURS 


3 . 95e~15 


29 


182 


24.9 


603 


I 


ALS_MOUSE 


INSULIN'LIKE GROWTH FA 


i.92e-15 


30 


179 


24,5 


382 


1 


PARG HUMAN 


PROT.ARGTN PRFfTTRinJ! /P 


3 . 01e-14 


31 


175 


24.0 


1115 


1 


GPCRJjYMST 


G'PROTEIN COUPLED RECE 


1.51e-13 


32 


170 


23.3 


536 


1 


CBP8 HUMAN 


CARBOXYPEPTIDASE N 83 


1.12e _ 12 


33 


170 


23.3 


1134 


I 


CHAO DROME 




1 12e-12 


34 


168 


23,0 


567 


I 


GPV_MOUSE 






35 


155 


2L2 


682 


I 


CONN DROME 


CONNECTIN PRECURSOR, 


l,03e-10 


36 


152 


20.8 


626 


1 


GPBAJUMAN 


PLATELET GLYCOPROTEIN 


l,28e-09 


37 


148 


20.3 


343 


1 


LUM_CHICK 


LUMICAN PRECURSOR (LUM 


5.88e-09 


38 


148 


20.3 


375 


1 


FMODJ30VIN 


FIBROMODULIN PRECURSOR 


5.88e-09 


39 


148 


20.3 


440 


1 


OMGPJUMAN 


OLIGODENDROCYTE-MYELIN 


5.88e-09 


40 


148 


20.3 


440 


1 


OMGPJOUSE 


OLIGODENDROCYTE-MYELIN 


5.88e-09 


41 


142 


19.5 


177 


1 


GPIXJUMAN 


PLATELET GLYCOPROTEIN 


5.64e-08 


42 


141 


19.3 


376 


1 


FMOD.RAT 


FIBROMODULIN PRECURSOR 


8.20e-08 


43 


141 


19.3 


376 


1 


FMODJiOUSE 


FIBROMODULIN PRECURSOR 


8.20e-08 


44 


137 


18.8 


376 


1 


FMODJUMAN 


FIBROMODULIN PRECURSOR 


3.61e-07 


45 


137 


18.8 


925 


1 


GLHRJWTEL 


PROBABLE GLYCOPROTEIN 


3.61e-07 



RESULT 
ID 
AC 



1 



SLIT.DROME STANDARD; PRT; 1480 AA. 
P24014; 

01-MAR-1992 (REL, 21, CREATED) 
01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 
01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 
SLIT PROTEIN PRECURSOR. 
SLI. 

DROSOPHILA MELANOGASTER (FRUIT FLY) . 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

DROSOPHILIDAE; DROSOPHILA. 

[1] 

SEQUENCE FROM N.A. 
MEDLINE; 91099665. 

ROTHBERG J.M., JACOBS J.R., GOODMAN C.S., ARTAVANIS-TSAKONAS S.; 
"Slit: an extracellular protein necessary for development of midline 
glia and commissural axon pathways contains both EGF and LRR 
domains."; 

GENES DEV. 4:2169-2187(1990), 

-I- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

MATRIX MOLECULES. 
-!- TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 

EVENTUALLY DISTRIBUTED ALONG THE AXONS, 
-I- ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

BY 11 AA AT THE C" TERMINUS OF THE LAST EGF REPEAT. 
-I- SIMILARITY: CONTAINS 7 EGF-LIKE DOMAINS. 
-!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

MANY PROTEINS. NUMBER IN THIS PROTEIN: 22. TWO BLOCK OF 6 LRR'S 

AND TWO BLOCKS OF 5 LRR'S. 
-!- SIMILARITY: CONTAINS A C- TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

This SWISS-PROT entry is copyright, it is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 

EMBL; X53959; G8615; -. 
PIR; A36665; A36665. 
FLYBASE; FBgn0003425; sli. 
PROSITE; PS00010; ASXJYDROXYL; 3. 
PROSITE; PS00022; EGF 1; 7. 
PROSITE; PS01185; CTCKJ; 1. 
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DR PROSITE; PS01186; EGF 2; 5, 

DR PROSITE; PS01187; EGF CA; 2, 

DR PROSITE; PS01225; CTCK_2; 1. 

DR PFAM; PF00007; Cys knot; 1. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PF00O54; laminin G; 1. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUCINE-REPEAT; DUPLICATION. 



FT 


SIGNAL 


1 


36 






CHAIN 


37 


1480 


CUT PRlYTPTN 


FT 


DOMAIN 


70 


104 


fANSFRVSTI M-Pr.SMITTNf! DP^TnU DP TUP TDD 


FT 


DOMAIN 


105 


230 


T.FtirTNF-RTCH RFPFATC fKT RFfTflKM 


FT 


DOMAIN 


231 


294 


/""fiMCPDVPA P-PT JMlfTMr DWTMJ flP TUP TDD 


FT 


DOMAIN 


295 


326 


POJKFRVPn N-FIMNITTNft RFGTflM DP TUP TDD 


FT 


DOMAIN 


327 


452 


T.FnPTNF-RTfH RFPFATO, OUT) RPfTflHl 


FT 


DOMAIN 


453 


518 


PAWCPDVPfi P-PTRMVTWI* DPPTrtM ftp TUP TDD 

tUHariKVCU L. cLftNrUNb KLluUN Of THE LRR. 


FT 


DOMAIN 


519 


550 


fflWCPDUPn M-PTAWtfTHf DP/^TrtM f\P O^tir TDD 




DOMAIN 


551 


653 


f PTTfTMP-DTfU DPDPRTC /iDn DPfTAkn 

LLUlint Kiln KiitxAlo (jKL) KiajIUN) , 




DOMAIN 


654 


714 


fflwcPDupn p-ptrmvtu^ DPnriH ap tup tdd 


■ 


DOMAIN 


715 


746 


rONQPBUPn M-PT HWITTMI? DPI* T AM ftp 1>UP TDD 


P 


DOMAIN 


747 


848 


I,PnrTNP-RTPH RFPFATC. /ATH RPPTOU^ 




HOMATN 


849 


910 


CONSERVED C" FLANKING REGION OF THE LRR. 


FT 


RFPFAT 


105 






FT 


REPEAT 


116 


139 






RFPFAT 


140 


163 


TDD 11 


FT 


REPEAT 


164 


187 


LRR 1-4 


FT 


REPEAT 


188 


211 


LRR 1-5, 


FT 


REPEAT 


212 


230 


LRR 1-6. 




RFPFAT 


327 


337 




FT 


REPEAT 


338 


361 


LRR 2-2. 




REPEAT 


362 


385 




FT 


REPEAT 


386 


409 


LRR 2-4* 


FT 


REPEAT 


410 


433 


i e' 


FT 


REPEAT 


434 


452 


LRR 2-6 


FT 


REPEAT 


551 


562 


LRR 3-1, 


FT 


REPEAT 


563 


586 


LRR 3 -2. 




REPEAT 


587 


610 




FT 


REPEAT 


611 


634 


LRR 3-4, 


FT 


REPEAT 


635 


653 


tdp a-?' 




REPEAT 


747 


757 




FT 


REPEAT 


758 


781 


TRR L-J 


FT 


REPEAT 


782 


805 


LRR 4-3' 


FT 


REPEAT 


806 


829 


LRR 4-4. 


FT 


REPEAT 


830 


848 


LRR 4-5, 


FT 


DOMAIN 


907 


944 


EGF-LIKE 1, 




DOMAIN 


946 


983 


Cur LlPiL I, 


EI 


DOMAIN 


985 


1022 


FGF-T.TKF 1 rATrTfTM-RTNTlTNG fPfYTFHTTm 


■ 


DOMAIN 


1024 


1062 


EGF-LIKE 4. 




DOMAIN 


1064 


1100 


FPF-TJIfF ^ fRTrTrTM-DTMnTHr 1 /DATPHT'TJIT \ 
CjVjI JjIMj J, DlWUlNu (rVlliNllAL) 


Ft 


DOMAIN 


1111 


1149 


EGF-LIKE 6. 


FT 


DOMAIN 


1353 


1392 


EGF-LIKE 7, 


FT 


DOMAIN 


1409 


1480 


CTCK. 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM) . 


FT 


CARBOHYD 


111 


111 


POTENTIAL. 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


FT 


CARBOHYD 


435 


435 


POTENTIAL. 


FT 


CARBOHYD 


783 


783 


POTENTIAL. 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


FT 


CARBOHYD 


998 


998 


POTENTIAL. 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL. 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL. 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL. 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISULFID 


916 


932 


BY SIMILARITY. 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DISULFID 


950 


961 


BY SIMILARITY. 


FT 


DISULFID 


955 


971 


BY SIMILARITY. 





mow pm 
uioubr iv 


973 




DV CTUTT 1\DTTV 

t)I o1M1JjARJ.1i, 


FT 


uiovus iv 


989 


1001 


DV CTVTT&DTTV 




UldUli? IV 


995 


1010 


DV CTUTT RDTTV 

ol olMlLARlli. 


FT 


DISULFID 


1012 


1021 


ov ctutt &RTTV 
DI OlWlLinKllI , 


FT 


UldVuc IV 


1028 


1041 


DV CTUTT RDTTV 
DI OlMlJjrtKJ.lI , 


FT 


DISULFID 


1035 


1050 


DV CTUTT.B.DTTV 


FT 


DISULFID 


1052 


1061 


DV CTUTT BDTTV 
DI MMiiiHKlil, 


FT 


UXJVUS IV 


1068 


1079 


DV CTUTT BDTfV 
DI OlMiJUAKllI. 




uiovue iv 


1073 




DV CTUTT RDTTV 
DI OlMlLftKllI , 


FT 


nTcnrpTn 

UlOULtlU 


1090 


moo 


DV CTUTT HDTfllV 

bi blMlLAKlIi, 


FT 


UldUiir IV 


1115 




DV CTUTT RDTTV 
DI SlMlJjflKllI, 


FT 


DISULFID 


1120 


1137 


BY SIMILARITY. 


FT 


DISULFID 


1139 


1148 


BY SIMILARITY. 


FT 


DISULFID 


1357 


1368 


BY SIMILARITY. 


FT 


DISULFID 


1362 


1380 


BY SIMILARITY. 


FT 


DISULFID 


1382 


1391 


BY SIMILARITY. 


FT 


DISULFID 


1409 


1443 


BY SIMILARITY. 


FT 


DISULFID 


1423 


1457 


BY SIMILARITY. 


FT' 


DISULFID 


1434 


1473 


BY SIMILARITY. 


FT 


DISULFID 


1438 


1475 


BY SIMILARITY. 


FT 


DISULFID 


1442 


1479 


BY SIMILARITY. 



SO. SEQUENCE 1480 AA; 165752 MW; 2CD1C421 CRC32; 

Query Match 62,9%; Score 459; DB 1; Length 1480; 

Best Local Similarity 53.8%; Pred. No; 2 . 99e-69; 

Matches 56; Conservative 24; Mismatches 24; mdels 0; Gaps 0; 

Db 298 PCRCADGIVDCREKSLTSVPVTLPDDTTDVRLEQNFITELPPKSFSSFRRLRRIDLSNNN 357 

II h: Mill 1:1 :| II: ::||||| I :|: :|: :::|:|||:|:| 
Qy 2 PCTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQ 61 

Db 358 ISRIAHDALSGLKQLTTLVLYGNKIKDLPSGVFKGLGSLRLLLL 401 

II II lh III MINIMS ::: hi II |j:|||| 
Qy 62 ISDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 2 

ID PGS1JUMAN STANDARD; PRT; 368 AA. 

AC P21810; P13247; 

DT 01-JAN-1990 (REL. 13, CREATED) 

DT 01-APR-1993 (REL. 25, LAST SEQUENCE UPDATE) 

DT 01-OCM996 (REL. 34, LAST ANNOTATION UPDATE) 

DE BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (PG-S1). 

GN BGN. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BONE; 

RX MEDLINE; 89174714. 

RA FISHER L.W., TERMINE J.D., YOUNG M.F.; 

RT "Deduced protein sequence of bone small proteoglycan I (biglycan) 

RT shows homology with proteoglycan II (decorin) and several 

RT nonconnective tissue proteins in a variety of species."; 

RL J. BIOL. CHEM, 264:4571-4576(1989). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91317791. 

RA FISHER L.W., HEEGAARD A.M., VETTER U. ( VOGEL W,, JUST W., 

RA TERMINE J.D. ( YOUNG M.F.; 

RT "Human biglycan gene. Putative promoter, intron-exon junctions, and 

RT chromosomal localization."; 

RL J. BIOL. CHEM, 266:14371-14377(1991). 

RN [3] 

RP SEQUENCE OF 38-57. 

RX MEDLINE; 90073579. 

RA ROUGHLEY P.J., WHITE R.J.; 

RT "Dermatan sulphate proteoglycans of human articular cartilage, The 

RT properties of dermatan sulphate proteoglycans I and II . " ; 

RL BIOCHEM. J. 262:823-827(1989). 

RN [4] 

RP SEQUENCE OF 38-66, 
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RX MEDLINE; 87250639. 

RA FISHER L.W., HAWKINS G.R., TUROSS N., TERMINE J.D.; 

RT "Purification and partial characterization of small proteoglycans I 

RT and II, bone sialoproteins I and II, and osteonectin from the mineral 

RT compartment of developing human bone."; 

RL J. BIOL. CHEM. 262:9702-9708(1987). 

CC -!- TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 

CC CONNECTIVE TISSUES, SPECIALLY. IN ARTICULAR CARTILAGES. 

CC -I- PTM: THE TWO 6LYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 

CC EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE. 

CC -I- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 

CC FAMILY. 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 12. 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 

•the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch), 

CC 

DR EMBL; J04599; G306884; -. 

DR EMBL; M65153; G179433; ALTJEQ. 

DR EMBL; M65152; G179433; JOINED. 

DR PIR; A28457; A28457. 

DR PIR; A32458; A32458. 

DR PIR; A40757; A40757. 

DR PIR; S05639; S05639. 

DR MIM; 301870; -. 

DR PFAM; PF00560; LRR; 5, 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUC I NE - REPEAT ; SIGNAL. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


PROPEP 


20 


37 




FT 


CHAIN 


38 


368 


BONE/CARTILAGE PROTEOGLYCAN I. 


FT 


DOMAIN 


71 


342 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


71 


94 


LRR 1, 


FT 


REPEAT 


95 


115 


LRR 2. 


FT 


REPEAT 


116 


139 


LRR 3. 


FT 


REPEAT 


140 


163 


LRR 4. 


FT 


REPEAT 


164 


184 


LRR 5. 


FT 


REPEAT 


185 


210 


LRR 6. 


FT 


REPEAT 


211 


230 


LRR 7. 




REPEAT 


231 


254 


LRR 8. 


1 


REPEAT 


255 


275 


LRR 9, 




REPEAT 


276 


301 


LRR 10. 


FT 


REPEAT 


302 


320 


LRR 11, 


FT 


REPEAT 


335 


342 


LRR 12. 


FT 


CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN. 


FT 


CARBOHYD 


47 


47 


GLYCOSAMINOGLYCAN. 


FT 


CARBOHYD 


270 


270 


POTENTIAL. 


FT 


CARBOHYD 


311 


311 


POTENTIAL. 


FT 


DISULFID 


63 


76 


BY SIMILARITY. 


FT 


DISULFID 


321 


354 


BY SIMILARITY. 


FT 


CONFLICT 


139 


140 


RL -> NV (IN REF, 1), 


FT 


CONFLICT 


163 ' 


164 


EL -> DV (IN REF. 1). 


SO 


SEQUENCE 


368 AA; 


41654 MW 


6820F8DF CRC32; 



Query Match 30.31; Score 221; DB 1; Length 368; 

Best Local Similarity 38. 61; Pred. No. 7,55e-22; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 

150 LVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLK 209 

1:111:111 ::||:|: :| |: :| I |: :::: |::: I : I : I Ihlll 
17 LME I PANLPEGIVE I RLEQNS I KAI PAGAFTQYKKLKR IDI SKNQI - - SDI APDAFQGLK 74 



2; 



Db 



Qy 



Db 210 - LNY LR I SEAKLTG I PKDL PETLNELHL 236 

I I 1:1 |:| I : I |:| 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



PGS1J0USE STANDARD; PRT; 369 AA. 
P28653; Q61355; 

01-DEC-1992 (REL. 24, CREATED) 
01-DEC-1992 (REL. 24, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 
BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (PG-S1). 
BGN, 

MUS MUSCULUS (MOUSE) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

[1] 

SEQUENCE FROM N.A. 

STRAIN-NIH SWISS; TISSUE-FIBROBLAST; 
NAITOH Y. , SUZUKI S.; 

SUBMITTED (JUL-1990) TO EMBL/GENBANK/DDBJ DATA BANKS . 
[2] 

SEQUENCE FROM N.A. 
STRAIN-NIH SWISS; TISSUE" EMBRYO; 
MEDLINE; 94319093. 

RAU W., JUST W., VETTER U., VOGEL W.; 

"A dinucleotide repeat in the mouse biglycan gene (EST) on the X 
chromosome."; 

MAMM. GENOME 5:395-396(1994). 



RC 
RX 
RA 
RT 
RT 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

DR EMBL; X53928; G53667; -. 
DR EMBL; L20276; G348962; -. 
DR PIR; S20811; S20811. 
DR MGD; MGI: 88158; BGN, 
DR PFAM; PF00560; LRR; 5. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 
KW REPEAT; LEUCINE-REPEAT; SIGNAL. 



-!■ TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 

CONNECTIVE TISSUES, SPECIALLY IN ARTICULAR CARTILAGES. 
-!- PTM: THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 

EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE (BY SIMILARITY). 
-!- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
FAMILY. 

-!■ SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS, NUMBER IN THIS PROTEIN : 12. 

This SWISS-PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch). 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


PROPEP 


20 


37 




FT 


CHAIN 


38 


369 


BONE/CARTILAGE PROTEOGLYCAN I. 


FT 


DOMAIN 


72 


343 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


72 


95 


LRR 1. 


FT 


REPEAT 


96 


116 


LRR 2. 


FT 


REPEAT 


117 


140 


LRR 3. 


FT 


REPEAT 


141 


164 


LRR 4. 


FT 


REPEAT 


165 


185 


LRR 5, 


FT 


REPEAT 


186 


211 


LRR 6. 


FT 


REPEAT 


212 


231 


LRR 7. 


FT 


REPEAT 


232 


255 


LRR 8. 


FT 


REPEAT 


256 


276 


LRR 9. 


FT 


REPEAT 


277 


302 


LRR 10. 


FT 


REPEAT 


303 


321 


LRR 11. 


FT 


REPEAT 


336 


343 


LRR 12. 


FT 


CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


48 


48 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


271 


271 


POTENTIAL, 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 


FT 


DISULFID 


64 


77 


BY SIMILARITY. 


FT 


DISULFID 


322 


355 


BY SIMILARITY. 


FT 


CONFLICT 


68 


68 


C ■> W (IN REF, 2). 


SQ 


SEQUENCE 


369 AA; 


41639 MW; ED21DD6B CRC32; 
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Query Match 30.3%; Score 221; DB 1; Length 369; 

Best Local Similarity 38.6%; Pred. No. 7.55e-22; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 LVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLK 210 

hllhlll ::ll:|: :| I: :| I |: :::: |::: I : t : I ||:||| 
Qy 17 LMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI--SDIAPDAFQGLK 74 

Db 211 -LNYLRISEAKLTGIPKDLPETLNELHL 237 

II: hi hi I : I |:| 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 4 

ID PGS1JAT STANDARD; PRT; 369 AA. 

AC P47853; 

DT 01-FEB-1996 (REL. 33, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 

DE BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (PG-S1). 

• BGN. 
RATTUS NORVEGICUS (RAT). 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENT I A; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-VASCULAR SMOOTH MUSCLE; 

RX MEDLINE; 91184222. 

RA DREHER K.L., ASUNDI V., MATZURA D., COWAN K.; 

RT "Vascular smooth muscle biglycan represents a highly conserved 

RT proteoglycan within the arterial wall."; 

RL EUR. J. CELL BIOL. 53:296-304(1990). 

CC -!- TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 

CC CONNECTIVE TISSUES, SPECIALLY IN ARTICULAR CARTILAGES. 

CC •!- PTM: THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 

CC EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE (BY SIMILARITY). 

CC ■!• SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 

CC FAMILY. 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS . NUMBER IN THIS PROTEIN: 12. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 



PEMBL; 017834; G600498; -. 
PFAM; PF00560; LRR; 5. 
KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 
KW REPEAT; LEUCINE-REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


19 


POTENTIAL, 


FT 


PROPEP 


20 


37 




FT 


CHAIN 


38 


369 


'BONE/CARTILAGE PROTEOGLYCAN I. 


FT 


DOMAIN 


72 


343 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


72 


95 


LRR 1. 


FT 


REPEAT 


96 


116 


LRR 2. 


FT 


REPEAT 


117 


140 


LRR 3. 


FT 


REPEAT 


141 


164 


LRR 4. 


FT 


REPEAT 


165 


185 


LRR 5. 


FT 


REPEAT 


186 


211 


LRR 6. 


FT 


REPEAT 


212 


231 


LRR 7. 


FT 


REPEAT 


232 


255 


LRR 8. 


FT 


REPEAT 


256 


276 . 


LRR 9. 


FT 


REPEAT 


277 


302 


LRR 10. 


FT 


REPEAT 


303 


321 


LRR 11, 


FT 


REPEAT 


336 


343 


LRR 12. 


FT 


CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN (BY SIMILARITY) , 


FT 


CARBOHYD 


48 


48 


GLYCOSAMINOGLYCAN (BY SIMILARITY) . 


FT 


CARBOHYD 


271 


271 


POTENTIAL. 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 



FT DISULFID 64 77 BY SIMILARITY. 
FT DISULFID 322 355 BY SIMILARITY. 
SQ SEQUENCE 369 AA; 41706 MW; 6555ECED CRC32; 

Query Match 30.3%;. Score 221; DB 1; Length 369; 

Best Local Similarity 38.6%; Pred, No. 7.55e-22; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 LVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLK 210 

1:111:111 ::M:|: :| |: :| I h :::: |:;: I ; I : I Ihlll 
Qy 17 LMEI PANLPEG I VE IRLEQNS IKAIPAG AFTQY KKLKRI D ISK NQ I - -SDIAPDAFQGLK 74 

Db 211 -LNYLRISEAKLTGIPKDLPETLNELHL 237 



Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 5 

ID PGS1.CANFA STANDARD; PRT; 369 AA. 

AC 002678; 

DT 15-JQL-1998 (REL. 36, CREATED) 

DT 15-JUL-1998 (REL, 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (PG-S1). 

GN BGN, 

OS CANIS FAMILIARIS (DOG) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC CARNIVORA; FISSIPEDIA; CANIDAE; CANIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA GLANTT.T,; 

RL SUBMITTED (DEC-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC -!- TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 

CC CONNECTIVE TISSUES, SPECIALLY IN ARTICULAR CARTILAGES. 

CC •!• PTM: THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 

CC EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE (BY SIMILARITY) . 

CC -I- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 

CC FAMILY. 

CC -I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 12, 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to liceose@isb-sib.ch) . 

CC 

DR EMBL; U83140; G1916846; -. 

DR PFAM; PF00560; LRR; 5. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUCINE-REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


PROPEP 


20 


37 


BY SIMILARITY. 


FT 


CHAIN 


38 


369 


BONE/CARTILAGE PROTEOGLYCAN I. 


FT 


DOMAIN 


72 


343 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


72 


95 


LRR 1. 


FT 


REPEAT 


96 


116 


LRR 2. 


FT 


REPEAT 


117 


140 


LRR 3. 


FT' 


REPEAT 


141 


164 


LRR 4. 


FT 


REPEAT 


165 


185 


LRR 5. 


FT 


REPEAT 


186 


211 


LRR 6, 


FT 


REPEAT 


212 


231 


LRR 7, 


FT 


REPEAT 


232 


255 


LRR 8, 


FT 


REPEAT 


256 


276 


LRR 9. 


FT 


REPEAT 


277 


302 


LRR 10. • 


FT 


REPEAT 


303 


321 


LRR 11, 


FT 


REPEAT 


336 


343 


LRR 12. 


FT 


CARBOHYD 


42 


42 


■GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


48 


48 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


271 


271 


POTENTIAL. 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 



Tue Jun 1 10:16:15 1999 US-09-191-647-3 .rsp Page 5 



FT DISULFID 64 77 BY SIMILARITY, 
FT DISULFID 322 355 BY SIMILARITY, 
SQ SEQUENCE 369 AA; 41566 MW; F794CEEA CRC32; 

Query Match 30.34; Score 221; DB 1; Length 369; 

Best Local Similarity 38.64; Pred. No. 7.55e-22; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 LVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLK 210 

1=111=111 ==11=1= =1 I: =1 I 1= ==:= l==: 1=1=1 11=111 
Qy 17 LME I PANLPEG I VEI RLEQNS I KAI PAGAFTQYKKLKRIDISKNQI - - SDIAPDAFQGLK 74 

Db 211 -LNYLRISEAKLTGIPKDLPETLNELHL 237 

II: |:| 1:1 I : I |:| 
Oy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



•ULT 6 
FSHRJAT STANDARD; PRT; 692 AA. 
P20395; 

DT 01-FEB-1991 (REL. 17, CREATED) 

DT 01-FEB-1991 (REL. 17, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR (FSH-R) (FOLLITROPIN 

DE RECEPTOR) , 

GN FSHR. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS, 

RN [1] 

RP SEQUENCE FROM N.A, 

RC TISSUE-SERTOLI CELLS; 

RX MEDLINE; 91125358. 

RA SPRENGEL R., BRAUN T., NIKOLICS K., SEGALOFF D.L., SEEBURG P.H.; 

RT "The testicular receptor for follicle stimulating hormone: structure 

RT and functional expression of cloned cDNA."; 

RL MOL. ENDOCRINOL. 4:525-530(1990). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 92149579, 

RA HECKERT L.L., DALEY I.J., GRISWOLD M.D.; 

RT "Structural organization of the follicle-stimulating hormone receptor 

RT gene,"; 

RL MOL. ENDOCRINOL. 6:70-80(1992), 

CC -!- FUNCTION: RECEPTOR FOR FOLLICLE STIMULATING HORMONE. THE ACTIVITY 

•OF THIS RECEPTOR IS MEDIATED BY G PROTEINS WHICH ACTIVATE 
ADENYLATE CYCLASE. 
•!- SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN. 

CC -!- TISSUE SPECIFICITY: SERTOLI CELLS AND OVARIAN GRANULOSA CELLS. 

CC -!- SIMILARITY: HIGHLY SIMILAR TO LSH AND TSH RECEPTORS. 

CC -!- SIMILARITY: BELONGS TO FAMILY 1 OF G- PROTEIN COUPLED RECEPTORS. 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib,ch) , 

CC : 

DR EMBL; L02842; G204184; -, 

DR PIR; A34548; A34548, 

DR PIR; A41729; A41729, 

DR GCRDB; GCRJ234; -. 

DR GCRDB; GCRJ456; -. 

DR PROSITE; PS00237; G.PROTEINJECEPTOR; 1. 

DR PFAM; PF00001; 7tm_l; 1. 

DR PFAM; PF00560; LRR; 3, 

DR HSSP; P23945; 1XUN. 

KW G'PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL; 

KW PHOSPHORYLATION; REPEAT; LEUCINE -REPEAT. 

FT SIGNAL 1 17 POTENTIAL. 

FT . CHAIN 18 692 FOLLICLE STIMULATING HORMONE RECEPTOR. 



FT 


DOMAIN 


18 


365 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


366 


386 


1 (POTENTIAL). 


FT 


DOMAIN 


387 


397 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


398 


420 


2 (POTENTIAL). 


FT 


DOMAIN 


421 


442 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


443 


464 


3 (POTENTIAL). 


FT 


DOMAIN 


465 


484 


CYTOPLASMIC (POTENTIAL), 


FT 


TRANSMEM 


485 


507 


4 (POTENTIAL). 


FT 


DOMAIN 


508 


527 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


528 


549 


5 (POTENTIAL). 


FT 


DOMAIN 


550 


572 


CYTOPLASMIC (POTENTIAL), 


FT 


TRANSMEM 


573 


596 


6 (POTENTIAL). 


FT 


DOMAIN 


597 


607 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


608 


629 


7 (POTENTIAL). 


FT 


DOMAIN 


630 


692 


CYTOPLASMIC (POTENTIAL) . 


FT 


CARBOHYD 


191 


191 


POTENTIAL. 


FT 


CARBOHYD 


199 


199 


POTENTIAL. 


FT 


CARBOHYD 


293 


293 


POTENTIAL. 


FT 


DISULFID 


441 


516 


BY SIMILARITY. 


SQ 


SEQUENCE 


692 AA; 


77681 MW; BFD1CAD7 CRC32; 



Query Match 30.34; Score 221; DB 1; Length 692; 

Best Local Similarity 32.44; Pred. No. 7.55e-22; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNRVFLCQDSKVTEIPTDLPRNAIELRFVLTKLRVIPRGSFAGFGDLEKIEISQNDV 82 

I III = I: : lll::|| :|:|: :: II |:|: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEADVFSNLPKLHEIRIEKANNLLYINPEAFQNLPSLRYLLI 127 

: I =1 I I I : : =|:= I I: I II: 11= 
Qy 63 SD - IAPDAFQGLRSLTSLVLY ■ GNK ITEI AKGLFDGLVSLQLLLL 105 



RESULT ■ 7 

ID FSHRJORSE STANDARD; PRT; 694 AA. 

AC P47799; 

DT 01-FEB-1996 (REL. 33, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR (FSH-R) (FOLLITROPIN 

DE RECEPTOR) . 

GN FSHR, 

OS EQUUS CABALLUS (HORSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PERISSODACTYLA; EQUIDAE; EQUUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC TISSUE-TESTIS; 

RX MEDLINE; 94256980. 

RA ROBERT P., AMSELLEM S,, CHRISTOPHE S., BENIFLA J.L., BELLET D. , 

RA ROMAN A., BIDART J.M.; 

RT "Cloning and sequencing of the equine testicular follitropin 

RT receptor."; 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 201:201-207(1994). 

CC -!- FUNCTION: RECEPTOR FOR FOLLICLE STIMULATING HORMONE. THE ACTIVITY 

CC OF THIS RECEPTOR IS MEDIATED BY G PROTEINS WHICH ACTIVATE 

CC ADENYLATE CYCLASE. AMONG ALL MAMMALIAN FSH RECEPTORS, ON THE HORSE 

CC RECEPTOR DOES NOT BIND LH/CHORIONIC GONADOTROPHIN (CG), 

CC -!- SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN. 

CC -!- SIMILARITY: HIGHLY SIMILAR TO LSH AND TSH RECEPTORS. 

CC -!- SIMILARITY: BELONGS TO FAMILY 1 OF G-PROTEIN COUPLED RECEPTORS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch). 

CC 

DR EMBL; S70150; G546897; -. 

DR GCRDB; GCR.1251; -. 



Tue Jun 1 10:16:15 1999 



US-09-191-1 



647-3. rsp 



Page 6 



PROSITE; PS00237; GJROTEIN RECEPTOR; 1, 
PFAM; PF00001; 7tm 1; 1. 
PFAM; PF0056Q; LRR; 3. 
HSSP; P23945; 1XUN. 



KW 


G- PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN 


KW 


PHOSPHORYLATION; R 


EPEAT 


LEUCINE-REPEAT . 


FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


694 


FOLLICLE STIMULATING HORMOI 


FT 




18 


365 


FXTR AfPr.T.ni.lR 1 DnTTNT T M 1 


FT 


TRANSMEM 


366 


386 


1 (rUitNllAL) . 


FT 


DOMAIN 


387 


397 


PVTrtDT ftCMTf /DrtTrMTTfiT \ 

UHUFIjAoMJL (rUlLNlXAJj) , 


FT 


TRANSMEM 


398 




i (rUlEiNllALj . 


FT 


DOMAIN 


421 


442 


tA 1 KftLbbLULAK ( rUi hNl J.AL ) . 


FT 


TRANSMEM 


443 


464 


3 (POTENTIAL). 


FT 


DOMAIN 


465 


484 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


485 


507 


4 (POTENTIAL). 


FT 


DOMAIN 


508 


527 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


528 


549 


5 (POTENTIAL). 


FT 


DOMAIN 


550 


572 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


573 


596 


6 (POTENTIAL). 




DOMAIN 


597 


607 


EXTRACELLULAR (POTENTIAL). 


• 


TRANSMEM 


608 


629 


7 (POTENTIAL). 




DOMAIN 


630 


694 


CYTOPLASMIC (POTENTIAL). 


FT 


CARBOHYD 


191 


191 


POTENTIAL, 


FT 


CARBOHYD 


199 


199 


POTENTIAL. 


FT 


CARBOHYD 


268 


268 


POTENTIAL, 


FT 


CARBOHYD 


293 


293 


POTENTIAL. 


SO 


SEQUENCE 


694 AA; 


78004 MW; C20D75BE CRC32; 



Query Match 30.3%; Score 221; DB 1; Length 694; 

Best Local Similarity 32,4%; Pred. No, 7.55e-22; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNRVFLCQESKVTEIPSDLPRNALELRFVLTKLRVIPKGAFSGFGDLEKIEISQNDV 82 

I III : I: : lll::|| :|:|: :: || |||; : I :|:|| |:: 
3y 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDHDAFQNLPNLQYLLI 127 

: I :: I I I : : :|:: I |: I :|| ||: 
3y 63 SD- IAPDAFQGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLLL 105 



ID FSHR.BOVIN STANDARD; PRT; 695 AA. 
AC P35376; 

DT 01-JUN-1994 (REL. 29, CREATED) 
DT 01-JON-1994 (REL. 29, LAST SEQUENCE UPDATE) 
DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR (FSH-R) (FOLLITROPIN 
RECEPTOR) , 
FSHR. 

BOS TAURUS (BOVINE) , 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; BOVINAE; BOS, 
(11 

SEQUENCE FROM N.A. 

STRAIN-HOLSTEIN; TISSUE-OVARY, AND TESTIS; 
MEDLINE; 95127199. 

HOUDE A., LAMBERT A,, SAUMANDE J., SILVERSIDES D.W., LUSSIER J.G.; 
"Structure of the bovine follicle-stimulating hormone receptor 
complementary DNA and expression in bovine tissues."; 
MOL. REPROD. DEV. 39:127-135(1994). 

-!- FUNCTION; RECEPTOR FOR FOLLICLE STIMULATING HORMONE. THE ACTIVITY 
OF THIS RECEPTOR IS MEDIATED BY G PROTEINS WHICH ACTIVATE 
ADENYLATE CYCLASE. 
-!- SUBCELLULAR LOCATION; INTEGRAL MEMBRANE PROTEIN. 
-!- SIMILARITY: HIGHLY SIMILAR TO LSH AND TSH RECEPTORS. 
-!- SIMILARITY: BELONGS TO FAMILY 1 OF G-PROTEIN COUPLED RECEPTORS, 



♦ 

OC 
OC 
RN 
RP 
RC 
RX 
RA 
RT 
RT 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 
CC the European Bioinformatics Institute, There are' no restrictions on its 
CC use by non-profit institutions as long as its content is in no way 



CC 


modified and this statement is not removed. Osage by and for commercial 


CC 


entities requires a license agreement (See http://www.isb-sib.cn/announce/ 


CC 
CC 
DR 


or send an email to licenseSisb-sib.ch). 


EMBL; L22319; G404672; - 




DR 


GCRDB; GCRJ766; - 






DR 


PROSITE; PS00237; 


GJROTEINJECEPTOR; 1. 


DR 


PFAM; PF00001; 7tm_l; 1. 




DR 


PFAM; PF00560; LRR; 3. 




DR 


HSSP; P23945; 1XUN. 




KW 


G-PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL; 


KW 


PHOSPHORYLATION; R 


EPEAT; 


LEUCINE-REPEAT. 


FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


695 


FOLLICLE STIMULATING HORMONE RECEPTOR, 




DOMAIN 


18 


366 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


367 


387 


1 (POTENTIAL). 


FT 


DOMAIN 


388 


398 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


399 


421 


2 (POTENTIAL) , 


FT 


DOMAIN 


422 


443 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


444 


465 


3 (POTENTIAL). 




DOMAIN 


466 


485 


CYTOPLASMIC (POTENTIAL). 


nm 


TRANSMEM 


486 


508 


4 (POTENTIAL). 


FT 


DOMAIN 


509 


528 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


529 


550 


5 (POTENTIAL). 


FT 


DOMAIN 


551 


573 


CYTOPLASMIC (POTENTIAL), 


FT 


TRANSMEM 


574 


597 


6 (POTENTIAL). 


FT 1 


DOMAIN 


598 


608 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


609 


630 


7 (POTENTIAL). 


FT 


DOMAIN 


631 


695 


CYTOPLASMIC (POTENTIAL). 


FT 


CARBOHYD 


191 


191 


POTENTIAL. 


FT 


CARBOHYD 


199 


199 


. POTENTIAL. 


FT 


CARBOHYD 


293 


293 


POTENTIAL. 


SQ 


SEQUENCE 


695 AA, 


78084 MW; 93AEC243 CRC32; 



Query Match 30.1*; Score 220; DB 1; Length 695; 

Best Local Similarity 32.4%; Pred, No. 1.16e-21; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNGVFLCQESKVTEIPSDLPRDAVELRFVLTKLRVIPKGAFSGFGDLEKIEISQNDV 82 

I III : I: : lll::|l lh|: :: II III: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDPDAFQNLPNLRYLLI 127 

: I :: I I I : : :|:: I |: I :|: ||: 
Qy 63 SD- IAPDAFQGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 9 

ID FSHR_SHEEP STANDARD; PRT; 695 AA. 

AC P35379; 

DT 01-JUN-1994 (REL. 29, CREATED) 

DT 01-JUN-1994 (REL. 29, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR (FSH-R) (FOLLITROPIN 

DE RECEPTOR) . 

GN FSHR. 

OS OVIS ARIES (SHEEP) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; CAPRINAE; OVIS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-TESTIS; 

RX MEDLINE; 93351750. 

RA YARNEY T.A., SAIRAM M.R., KHAN H., RAVINDRANATH N., PAYNE S., 

RA SEIDAH N.G.; 

RT "Molecular cloning and expression of the ovine testicular follicle 

RT stimulating hormone receptor,"; 

RL MOL. CELL. ENDOCRINOL. 93:219-226(1993). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE-TESTIS; 

RX MEDLINE; 93176195. 

RA KHAN H. , YARNEY T.A., SAIRAM M.R.; 
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RT "Cloning of alternately spliced mRNA transcripts coding for variants 

RT of ovine testicular follitropin receptor lacking the G protein 

RT coupling domains."; 

RL BIOCHEM. BIOPHYS. RES, COMMUN, 190:888-894(1993), 

CC •!- FUNCTION: RECEPTOR FOR FOLLICLE STIMULATING HORMONE. THE ACTIVITY 

CC OF THIS RECEPTOR IS MEDIATED BY G PROTEINS WHICH ACTIVATE 

CC ADENYLATE CYCLASE. 

CC ■!- SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN, 

CC -!■ SIMILARITY: HIGHLY SIMILAR TO LSH AND TSH RECEPTORS. 

CC •!- SIMILARITY: BELONGS TO FAMILY 1 OF G-PROTEIN COUPLED RECEPTORS. 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.chj, 



W EMBL; L07302; G165885; -. 

DR PIR; JC1493; JC1493. 

DR GCRDB; GCRJ496; -. 

DR PROSITE; PS00237; G PROTEIN RECEPTOR; 1. 

DR PFAM; PF000Q1; 7tm_l; 1. 

DR PFAM; PF0056Q; LRR; 3, 

DR HSSP; P23945; 1XDN, 

KW G-PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL; 

KW PHOSPHORYLATION; REPEAT; LEUCINE-REPEAT. 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


695 


FOLLICLE STIMULATING HORMONE RECEPTOR, 


FT 


DOMAIN 


18 


366 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


367 


387 


1 (POTENTIAL) . 


FT 


DOMAIN 


388 


398 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


399 


421 


2 (POTENTIAL). 


FT 


DOMAIN 


422 


443 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


444 


465 


3 (POTENTIAL). 


FT 


DOMAIN 


466 


485 


CYTOPLASMIC (POTENTIAL), 


FT 


TRANSMEM 


486 


508 


4 (POTENTIAL). 


FT 


DOMAIN 


509 


528 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


529 


550 


5 (POTENTIAL), 


FT 


DOMAIN 


551 


573 


CYTOPLASMIC (POTENTIAL), 


FT 


TRANSMEM 


574 


597 


6 (POTENTIAL). 


FT 


DOMAIN 


598 


608 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


609 


630 


7 (POTENTIAL), 


FT 


DOMAIN 


631 


695 


CYTOPLASMIC (POTENTIAL). 




DISULFID 


442 


517 


BY SIMILARITY, 


• 


CARBOHYD 


191 


191 


POTENTIAL. 




CARBOHYD 


199 


199 


POTENTIAL. 


FT 


CARBOHYD 


293 


293 


POTENTIAL. 


SQ 


SEQUENCE 


695 AA; 


78237 MW 


5C0B07D6 CRC32; 



Query Match 30.04; Score 219; DB 1; Length 695; 

Best Local Similarity 31.4%; Pred. No. 1.77e-21; 

Matches 33; Conservative 28; Mismatches 42; mdels 2; Gaps 2; 

Db 23 CHCSNGVFLCQDSKVTEMPSDLPRDAVELRFVLTKLRVIPEGAFSGFGDLEKIEISQNDV 82 

I III : I: : l:|::|| Ihh :: II III: : I :|:|| |:: 
Oy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDPDAFQNLPNLRYLLI 127 

: I :: I I I : : :|:: I |: I :|: II: 
Qy 63 SD - I APDAFQGLKSLT SLVLY - GNK ITE IAKGLFDGLVSLQLLLL 105 



RESULT 10 

ID FSHR.PIG STANDARD; PRT; 695 AA. 

AC P49059; 077514; 

DT 01-FEB-1996 (REL. 33, CREATED) 

DT 15-DEC-1998 (REL. 37, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR (FSH-R) (FOLLITROPIN 

DE RECEPTOR) . 

GN , FSHR'. 



OS SUS SCROFA (PIG). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARTIODACTYLA; SUIFORMES; SUINA; SUIDAE; SUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-OVARY; 

RX MEDLINE; 96011644. 

RA REMY J.J., LAHBIB-MANSAIS Y. , YERLE M., BOZON V., COUTURE L, 

RA PAJOT E,, GREBERT D., SALESSE R.; 

RT "The porcine follitropin receptor: cDNA cloning, functional 

RT expression and chromosomal localization of the gene,"; 

RL GENE 163:257-261(1995), 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE-OVARY; 

RA WANG Y.F., MEYER K.B., SCHMIDT K. , WAN S.J., DEGEN S.J.F., 

RA LA BARBERA A.R.; 

RT "Porcine follicle-stimulating hormone receptor."; 

RL SUBMITTED (SEP-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC -I- FUNCTION: RECEPTOR FOR FOLLICLE STIMULATING HORMONE. THE ACTIVITY 

CC OF THIS RECEPTOR IS MEDIATED BY G PROTEINS WHICH ACTIVATE 

CC ADENYLATE CYCLASE. 

CC •!- SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN, 

CC •!- SIMILARITY: HIGHLY SIMILAR TO LSH AND TSH RECEPTORS. 

CC -!- SIMILARITY: BELONGS TO FAMILY 1 OF G-PROTEIN COUPLED RECEPTORS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; L31966; G1066833; -. 

DR EMBL; AF025377; G3282511; -. 

DR GCRDB; GCR.1561; -. 

DR PROSITE; PS00237; G.PROTEINJECEPTOR; 1. 

DR PFAM; PF00001; 7tm_l; 1. 

DR PFAM; PF00560; LRR; 3. 

DR HSSP; P23945; 1XUN. 



KW 


G-PROTEIN COUPLED 


RECEPT 


DR; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL; 


KW 


PHOSPHORYLATION; 


REPEAT; 


LEUCINE-REPEAT. 


FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


695 


FOLLICLE STIMULATING HORMONE RECEPTOR. 


FT 


DOMAIN 


18 


366 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


367 


387 


1 (POTENTIAL). 


FT 


DOMAIN 


388 


398 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


399 


421 


2 (POTENTIAL). 


FT 


DOMAIN 


422 


443 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


444 


465 


3 (POTENTIAL). 


FT 


DOMAIN 


466 


485 


CYTOPLASMIC (POTENTIAL), 


FT 


TRANSMEM 


486 


508 


4 (POTENTIAL). 


FT 


DOMAIN 


509 


528 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


529 


550 


5 (POTENTIAL) . 


FT 


DOMAIN 


551 


573 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


574 


597 


6 (POTENTIAL). 


FT 


DOMAIN 


598 


608 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


609 


630 


7 (POTENTIAL). 


FT 


DOMAIN 


631 


695 


CYTOPLASMIC (POTENTIAL) . 


FT 


CARBOHYD 


191 


191 


POTENTIAL. 


FT 


CARBOHYD 


199 


199 


POTENTIAL, 


FT 


CARBOHYD 


293 


293 


POTENTIAL. 


FT 


DISULFID 


442 


517 


BY SIMILARITY. 


FT 


CONFLICT 


2 


2 


S ■> A (IN REF. 1). 


FT 


CONFLICT 


13 


13 


T -> S (IN REF. 1), 


FT 


CONFLICT 


60 


60 


V •> A (IN REF. 1). 


FT 


CONFLICT 


166 


166 


V -> M (IN REF. 1). 


FT 


CONFLICT 


215 


215 


Q ■> H (IN REF. 1). 


FT 


CONFLICT 


247 


247 


K -> R (IN REF. 1). 


FT 


CONFLICT 


257 


257 


S •> T (IN REF. 1). 


FT 


CONFLICT 


334 


334 


D -> N (IN REF. 1). 


FT 


CONFLICT 


349 


349 


E -> K (IN REF, 1). 
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FT 




352 


352 


T -\ 1 /TM DCP 
1 > A (IN Mil . 


!)• 






383 




V -s P /TM DPP 

V ? ti (Ifl Ktt . 


!)• 


FT 


CONFLICT 


407 


407 


A -> T (IN REF. 


1). 


FT 


CONFLICT 


421 


421 


V ■> I (IN REF. 


1). 


FT 


CONFLICT 


427 


427 


T •> S (IN REF. 


1). 


FT 


CONFLICT 


435 


435 


D -> N (IN REF. 


1). 


FT 


CONFLICT 


483 


483 


L •> V (IN REF. 


1). 


FT 


CONFLICT 


550 


550 


T -> I (IN REF. 


1). 


FT 


CONFLICT 


586 


586 


A -> V (IN REF, 


1). 


FT 


CONFLICT 


607 


607 


S •> L (IN REF. 


1). 


FT 


CONFLICT 


691 


691 


R -> H (IN REF. 


!)• 


SQ 


SEQUENCE 


695 AA; 


78172 MW; 10A3EA81 CRC32; 


Query Match 




30.0%; 


Score 219; DB 1; L 



Best Local Similarity 32.41; Pred, No, 1.77e-21; 
Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNGVFLCQESKVTEIPPDLPRNAVELRFVLTKLRVIPKGAFSGFGDLEKIEISQNDV 82 

I III : I: : lll::|| ll:|: :: II III: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

•83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDPDAFQNLPNLRYLLI 127 
: I :: I I I : : :|:: I |: I :|: ||:' 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 11 

ID FSHRJiACFA STANDARD; PRT; 695 AA. 

AC P32212; 

DT 01-OCM993 (REL. 27, CREATED) 

DT 01 -OCT-1993 (REL. 27, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR (FSH-R) (FOLLITROPIN 

DE RECEPTOR) . 

GN FSHR. 

OS MACACA FASCICULARIS (CRAB EATING MACAQUE) (CYNOMOLGUS MONKEY). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRKINI; CERCOPITHECIDAE; CERCOPITHECINAE; MACACA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-TESTIS; 

RX MEDLINE; 94071854, 

RA GROMOLL J., DANKBAR B., SHARMA R.S., NIESCHLAG E. ; 

RT "Molecular cloning of the testicular follicle stimulating hormone 

RT receptor of the non human primate Macaca fascicularis and 

RT identification of multiple transcripts in the testis."; 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 196:1066-1072(1993). 

CC ■!■ FUNCTION: RECEPTOR FOR FOLLICLE STIMULATING HORMONE. THE ACTIVITY 

§OF THIS RECEPTOR IS MEDIATED BY G PROTEINS WHICH ACTIVATE 
ADENYLATE CYCLASE. 
-!• SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN. 

CC -!• SIMILARITY: HIGHLY SIMILAR TO LSH AND TSH RECEPTORS. 

CC -!■ SIMILARITY: BELONGS TO FAMILY 1 OF G" PROTEIN COUPLED RECEPTORS, 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; X74454; G396802; -, 

DR .PIR; S36452; S36452. 

DR PIR; JN0898; JN0898. 

DR GCRDB; GCRJ653; •. 

DR PROSITE; PS00237; G.PROTEIN RECEPTOR; 1. 

DR PFAM; PF00Q01; 7tm_l; 1. 

DR PFAM; PF00560; LRR; 3. 

DR HSSP; P23945; 1XUN. 

KW G-PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL; 

KW PHOSPHORYLATION; REPEAT; LEUCINE-REPEAT . 

FT SIGNAL 1 17 POTENTIAL, 



FT 


CHAIN 


18 


695 


FOLLICLE STIMULATING HORMONE RECEPTOR, 


Jm 


DOMAIN 


18 


366 


EXTRACELLULAR (POTENTIAL). 




IKAMtM 


367 


387 


1 (POTENTIAL) , 


J 


m"IM»TM 
UUMnlN 




398 


CYTOPLASMIC (POTENTIAL), 






399 




l ifUlhNllAL) . 


FT 


nfiMATN 


422 


443 




FT 


TKANsMlM 


444 


465 


3 (POTENTIAL). 


FT 


DOMAIN 


466 


485 


CYTOPLASMIC ( POT ENT I AL ) . 


FT 


TRANSMEM 


486 


508 


4" (POTENTIAL) . 


FT 


DUMA IN 


509 


528 


EXTRACELLULAR (POTENTIAL). 




TRANSMEM 


529 


550 


5 (POTENTIAL) , 


FT 


DOMAIN 






CYTOPLASMIC (POTENTIAL), 


FT 


TRANSMEM 


574 


597 


6 (POTENTIAL), 


FT 


DOMAIN 


598 


608 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


609 


630 


7 (POTENTIAL), 


FT 


DOMAIN 


631 


695 


CYTOPLASMIC (POTENTIAL) . 


FT 


CARBOHYD 


191 


191 


POTENTIAL. 


FT 


CARBOHYD 


199 


199 


POTENTIAL. 


FT 


CARBOHYD 


293 


293 


POTENTIAL. 


FT 


CARBOHYD 


318 


318 


POTENTIAL. 


FT 


DISULFID 


442 


517 


BY SIMILARITY. 


SQ 


SEQUENCE 


695 AA; 


78343 MW 


ED3151A9 CRC32; 



Query Match 29.9%; Score 218; DB 1; Length 695; 

Best Local Similarity 31.4%; Pred. No. 2.72e-21; 

Matches 33; Conservative 28; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNRVFLCQESKVTEIPSDLPRNAIELRFVHTKLRVIQKGAFSGFGDLEKIEISQNDV 82 

I III : I: : lll::|l :|:|: : :: I III: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEADVFSNLPKLHEIRIEKANNLLYINPEAFQNLPNLRYLLI 127 

: I :| I I I : : :|:: I |: I :|: II: 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 12 

ID FSHRJUMAN STANDARD; PRT; 695 AA. 

AC P23945; 

DT 01-MAR-1992 (REL. 21, CREATED) 

DT 0WUN-1994 (REL. 29, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR (FSH-R) (FOLLITROPIN 

DE RECEPTOR) . 

GN FSHR. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA;. CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-OVARY; 

RX MEDLINE; 91222171. 

RA MINEGISH T., NAKAMURA K, , TAKAKURA Y., IBUKI Y., IGARASHI M.; 

RT "Cloning and sequencing of human FSH receptor cDNA. \ 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 175:1125-1130(1991). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE-TESTIS; 

RX MEDLINE; 93246012. 

RA KELTON C.A., CHENG S.V., NUGENT N.P., SCHWEICKHARDT R.L., 

RA ROSENTHAL J.L., OVERTON S.A., WANDS G.D., KUZEJA J.B., LUCHETTE C.A., 

RA CHAPPELS.C; 

RT "The cloning of the human follicle stimulating hormone receptor and 

RT its expression in COS-7, CHO, and Y-l cells,"; 

RL MOL. CELL. ENDOCRINOL. 89:141-151(1992). 

RN [3] 

RP SEQUENCE FROM N.A. 

RA TILLY L.T., AIHARA T., NISHIMORI K,, JAI X.-C, BILLIG H., 

RA' KOWALSKI K.I,, PERLAS E.A., HSUEH A.J.; 

RL SUBMITTED (XXX-1992) TO EMBL/GENBANK/DDBJ DATA BANKS, 

RN [4] 

RP SEQUENCE OF 1-342 FROM N.A. 

RC TISSUE-TESTIS; 
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RX MEDLINE; 93075197. 

RA GROMOLL J., GUDERMANN T. , NIESCHLAG E. ; 

RT "Molecular cloning of a truncated isoform of the human follicle 

RT stimulating hormone receptor."; 

RL BIOCHEM, BIQPHYS. RES, COMMUN. 188:1077-1083(1992). 

RN [5] 

RP SEQUENCE OF 1-51 FROM N.A. 

RX MEDLINE; 95011044. 

RA GROMOLL J., DANKBAR B., GUDERMANN T.; 

RT "Characterization of the 5' flanking region of the human follicle- 
RT stimulating hormone receptor gene."; 
RL MOL. CELL. ENDOCRINOL. 102:93-102(1994). 
RN [6] 

RP 3D -STRUCTURE MODELLING OF 49-228. 
RX MEDLINE; 96363672. 

RA JIANG X., DREANO M., BUCKLER D.R., CHENG S. ( YTHIER A., WU H., 
RA HENDRICKSON W.A., EL TAYAR N.; 
ft "Structural predictions for the ligand-binding region of glycoprotein 
H hormone receptors and the nature of hormone-receptor interactions."; 
STRUCTURE 3:1341-1353(1995). 

-I- FUNCTION: RECEPTOR FOR FOLLICLE STIMULATING HORMONE . THE ACTIVITY 
OF THIS RECEPTOR IS MEDIATED BY G PROTEINS WHICH ACTIVATE 
ADENYLATE CYCLASE. 
-!■ SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN. 
-!• TISSUE SPECIFICITY: SERTOLI CELLS AND OVARIAN GRANULOSA CELLS. 
-!■ ALTERNATIVE PRODUCTS: A SHORT FORM OF THE TESTICULAR PROTEIN IS 
PRODUCED BY ALTERNATIVE SPLICING OF THE SAME GENE. THE SEQUENCE 
SHOWN HERE IS THAT OF THE LONG TESTICULAR PROTEIN. 
-!- SIMILARITY: HIGHLY SIMILAR TO LSH AND TSH RECEPTORS. 
-!- SIMILARITY: BELONGS TO FAMILY 1 OF G- PROTEIN COUPLED RECEPTORS. 

This SWISS -PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute, There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch). 

EMBL; M65085; G182771; -. 
EMBL; S59900; G300073; -. 
EMBL; M95489; G182773; -. 
EMBL; X68044; G31474; -. 
EMBL; S73199; G685037; -. 
PIR; JN0122; JN0122. 
PDB; 1XUN; 15-MAY-97. 
GCRDB; GCRJ071; -. 
GCRDB; GCRJ404; -. 
GCRDB; GCRJ588; -. 
GCRDB; GCRJ690; -. 
MIM; 136435; -. 

PROSITE; PS00237; G.PROTEIN RECEPTOR; 1. 
PFAM; PF00001; 7tm_l; 1. 
PFAM; PF00560; LRR; 3. 

G-PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL; 
PHOSPHORYLATION; REPEAT; LEUCINE-REPEAT; ALTERNATIVE SPLICING; 
3D-STRUCTURE. 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


695 


FOLLICLE STIMULATING HORMONE RECEPTOR. 


FT 


DOMAIN 


18 


366 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


367 


387 


1 (POTENTIAL). 


FT 


DOMAIN 


388 


398 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


399 


421 


2 (POTENTIAL). 


FT 


DOMAIN 


422 


443 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


444 


465 


3 (POTENTIAL). 


FT 


DOMAIN 


466 


485 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


486 


508 


4 (POTENTIAL). 


FT 


DOMAIN 


509 


528 


EXTRACELLULAR (POTENTIAL) , 


FT 


TRANSMEM 


529 


550 


5 (POTENTIAL), 


FT 


DOMAIN 


551 


573 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


574 


597 


6 (POTENTIAL). 


FT 


DOMAIN 


598 


608 


EXTRACELLULAR (POTENTIAL). 


FT 


.TRANSMEM 


609 


630 


7 (POTENTIAL). 



FT 


DOMAIN 


631 


695 


CYTOPLASMIC (POTENTIAL). 


FT 


CARBOHYD 


191 


191 


POTENTIAL. 


FT 


CARBOHYD 


199 


199 


POTENTIAL. 


FT 


CARBOHYD 


293 


293 


POTENTIAL. 


FT 


CARBOHYD 


318 


318 


POTENTIAL. 


FT 


DISULFID 


442 


517 


BY SIMILARITY. 


FT 


VARSPLIC 


224 


285 


MISSING (IN SHORT TESTICULAR FORM) , 


FT 


VARSPLIC 


342 


695 


MISSING (IN SHORT TESTICULAR FORM) , 


FT 


CONFLICT 


13 


13 


S -> R (IN REF. 4). 


FT 


CONFLICT 


112 


112 


N -> T (IN REF. 1). 


FT 


CONFLICT 


197 


198 


EL -> AV (IN REF. 1). 


FT 


CONFLICT, 


295 


295 


S -> P (IN REF, 4). 


FT 


CONFLICT 


307 


307 


T ■> A (IN REF, 1). 


FT 


CONFLICT 


680 


680 


N ■> S (IN REF. 1). 


SO 


SEQUENCE 


695 AA; 


78294 MW 


7032FA16 CRC32; 



Query Match 29 . 7%; Score 217; DB 1; Length 695; 

Best Local Similarity 32.44; Pred, No, 4.16e-21; 

Matches 34; Conservative 26; Mismatches 43; Indels 2; Gaps 2; 

Db 23 CHCSNRVFLCQESKVTEIPSDLPRNAIELRFVLTKLRVIQKGAFSGFGDLEKIEISQNDV 82 

I III : h : ll|::|f :|:|: :: I III: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEADVFSNLPKLHEIRIEKANNLLYINPEAFQNLPNLQYLLI 127 

: I :| I I I : : :|:: I . I: I :|| !h 
Qy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



ILT 13 

FSHRJQUAS STANDARD; PRT; 687 AA. 
Q95179; 

01-NOV-1997 (REL. 35, CREATED) 

01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

FOLLICLE STIMULATING HORMONE RECEPTOR PRECURSOR (FSH-R) (FOLLITROPIN 



FSHR, 

EQUUS ASINUS (DONKEY) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

PERISSODACTYLA; EQUIDAE; EQUUS. 

[1] 

SEQUENCE FROM N,A. 
TISSUE=TESTIS; 
MEDLINE; 97338913. 

RICHARD F., MARTINAT N. , REMY J, -J., SALESSE R,, COMBARNOUS Y.; 
"Cloning, sequencing and in vitro functional expression of 
recombinant donkey follicle-stimulating hormone receptor: a new 
insight into the binding specificity of gonadotropin receptors,"; 
J, MOL, ENDOCRINOL. 18:193-202(1997). 

•!- FUNCTION; RECEPTOR FOR FOLLICLE STIMULATING HORMONE. THE ACTIVITY 
OF THIS RECEPTOR IS MEDIATED BY G PROTEINS WHICH ACTIVATE 
ADENYLATE CYCLASE. 

-!• SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN. 

■!■ SIMILARITY: HIGHLY SIMILAR TO LSH AND TSH RECEPTORS. 

-I- SIMILARITY: BELONGS TO FAMILY 1 OF G-PROTEIN COUPLED RECEPTORS. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement Is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch). 

EMBL; U73659; G1658014; -, 
GCRDB; GCR.1116; -. 

PROSITE; PS00237; G.PROTEIN RECEPTOR; 1. 
PFAM; PF00Q01; 7tm_l; 1. 
PFAM; PF00560; LRR; 3. 
HSSP; P23945; 1XUN. 

G-PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN; SIGNAL; 
PHOSPHORYLATION; REPEAT; LEUCINE-REPEAT, 
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FT SIGNAL 

FT CHAIN 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT DISULFID 

FT CARBOHYD 

FT CARBOHYD 

^ CARBOHYD 

Query Match 



18 
359 
380 
391 
414 
436 
458 
478 
501 
521 
543 
566 
590 
601 
623 
434 
191 
199 
293 



17 
687 
358 
379 
390 
413 
435 
457 
477 
500 
520 
542 
565 
589 
600 
622 
687 
509 
191 
199 
293 



SEQUENCE 687 AA; 76937 MW; 



Best Local Similarity 32,41; 
Matches 34; Conservative 



POTENTIAL. 

FOLLICLE STIMULATING HORMONE RECEPTOR. 
EXTRACELLULAR (POTENTIAL), 

1 (POTENTIAL). 
CYTOPLASMIC (POTENTIAL). 

2 (POTENTIAL). 
EXTRACELLULAR (POTENTIAL), 

3 (POTENTIAL), 
CYTOPLASMIC (POTENTIAL), 

4 (POTENTIAL), 
EXTRACELLULAR (POTENTIAL). 

5 (POTENTIAL). 
CYTOPLASMIC (POTENTIAL). 

6 (POTENTIAL). 
EXTRACELLULAR (POTENTIAL). 

7 (POTENTIAL). 
CYTOPLASMIC (POTENTIAL). 
BY SIMILARITY, 
POTENTIAL, 
POTENTIAL, 
POTENTIAL. 

53777553 CRC32; 

Score 215; DB 1; Length 687; 
Pred. No. 9.72e-21; 

27; Mismatches 42; Indels 2; Gaps 



Db 23 CHYSNRVFLCQESKVTEIPSDLPRNALELRFVLTKLRVIPKGAFSGFGDLKKIEISQNDV 82 

I II: I: : ll|::|| ;|:|: :: II III: : l|:|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYRKLKRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDHDAFQNLPNLQYLLI 127 

: I I I I : : :|" I |: I :|| ||: 
Oy 63 SD-IAPDAFQGLKSLTSLVLY-GNKITEIAKGLFDGLVSLQLLLL 105 



AC 
DT 
DT 
DT 
DE 
DE 
GN 
OS 
OC 
OC 



RA 



PRT; 369 AA. 



RESULT 14 

ID PGS1.BOVIN STANDARD; 
P21809; P79259; 
01-MAY-1991 (REL, 18, ( 
15-JUL-1998 (REL, 36, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (LEUCINE-RICH PG I) 
(PG-S1). 
BGN. 

BOS TAURUS (BOVINE) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; BOVINAE; BOS, 

• ID 
SEQUENCE FROM N.A. 
TISSUE=AORTA; 
RX MEDLINE; 96113563. 

RA XU J.H., RADHAKRISHNAMURTHY B., SRINIVASAN S.R., BERENSON G.S.; 

RT "Primary structure of bovine aorta biglycan core protein deduced from 

RT cloned CDNA."; 

RL BIOCHEM. MOL. BIOL. INT. 37:263-272(1995). 
RN [2] 

RP SEQUENCE OF 38-369. 

RC TISSUE-CARTILAGE; 

RX MEDLINE; 89255324, 

RA NEAME P.J., CHOI H.O., ROSENBERG L.C.; 

RT "The primary structure of the core protein of the small, leucine-rich 
RT proteoglycan (PG I) from bovine articular cartilage."; 
RL J. BIOL. CHEM. 264:8653-8661(1989). 
RN [3] 

RP SEQUENCE OF 38-63. 
RC TISSUE=CARTILAGE; 
RX MEDLINE; 89123388. 

CHOI H.U., JOHNSON T.L., PAL S., TANG L.H., ROSENBERG L.C., 
NEAME P.J.; 

RT "Characterization of the dermatan sulfate proteoglycans, DS-PGI and 
RT DS-PGII, from bovine articular cartilage and skin isolated by octyl- 
RT sepharose chromatography."; 



RL 


J. BIOL. 


CHEM. 264:2876-2884(1989). 


CC 


-!- TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 




CONNECTIVE TISSUES, 


SPECIALLY IN ARTICULAR CARTILAGES. 


rr 


*!- PTM: 


THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 


rr 


EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE. 


nr 


-!- SIMILARITY: BE 


jONGS 


TO THE SMALL INTERSTITIAL PROTEOGLYCANS 


CC 


FAMILY , 






CC 


-!- SIMILARITY: THE REP 


MED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 


CC 
CC 
CC 


MANY PROTEINS, 


NUMB 


a IN THIS PROTEIN: 10. 


This SWISS-PROT entry is copyright, It is produced through a collaboration 




between 


the Swiss Institute of Bioinformatics and the EMBL outstation - 


rr 


the European Bioinformatics Institute. There are no restrictions on its 


rr 


use by 


non-profit institutions as long as its content is in no way 


rr 


modified and this statement is not removed. Usage by and for commercial 


CC 


entities requires a lie 


snse agreement (See http; //www. isb-s ib.cn/announce/ 


CC 
CC 


or send an email to license@isb-sib.ch). 


EMBL; S82652; G1835865; 




np 


PIR; A33701; A33701. 






PFAM; PF00560; LRR 


5. 




ra 


GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 


m 

:J 


SIGNAL; I 


EPEAT; LEUCINE 


REPEAT. 


FT 


SIGNAL 


1 


19 


POTENTIAL. 




PROPEP 


20 


37 






CHAIN 


38 


369 


BONE/CARTILAGE PROTEOGLYCAN I. 




DOMAIN 


93 


316 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


93 


106 


LRR 1. 


C"T 
_~ 


REPEAT 


117 


130 


LRR 2. 


J 


REPEAT 


141 


154 


LRR 3. 


FT 


REPEAT 


162 


175 


LRR 4. 


FT' 


REPEAT 


186 


199 


LRR 5. 


FT 


REPEAT 


211 


224 


LRR 6. 


FT 


REPEAT . 


232 


245 


LRR 7. 


FT 


REPEAT 


256 


269 


LRR 8. 


FT 


REPEAT 


280 


288 


LRR 9. 


FT 


REPEAT 


303 


316 


LRR 10. 


FT 


CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN, 


FT 


CARBOHYD 


48 


48 


GLYCOSAMINOGLYCAN. 


FT 


CARBOHYD 


271 


271 




FT 


CARBOHYD 


312 


312 




FT 


DISULFID 


64 


77 




FT 


DISULFID 


322 


355 




FT 


CONFLICT 


152 


152 


C •> V (IN REF. 2). 


FT 


CONFLICT 


188 


188 


C -> E (IN REF. 2). 


FT 


CONFLICT 


354 


354 


A •> R (IN REF, 2), 


FT 


CONFLICT 


368 


369 


KK -> Y (IN REF. 2). 


SQ 


SEQUENCE 


369 AA; 


41509 MW; F1CC673B CRC32; 



Query Match 28.9%; Score 211; DB 1; Length 369; 

Best Local Similarity 38. 41; Pred. No. 5.29e-20; 

Matches. 33; Conservative 24; Mismatches 26; Indels 3; Gaps 2; 

Db 153 EIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLK-L 211 

111:111 ::||:|: :| |: :| I |: :::: |::: I : I : I IhNI I 
Qy 19 EIPANLPEGIVEIRLEQNSIKAIPAGAFTQYRKLKRIDISKNQI--SDIAPDAFQGLKSL 76 

Db 212 NYLRISEAKLTGIPKDLPETLNELHL 237 

I : hi 1:1 I : I hi 
Qy 77 TSLVLYGNK ITE IAKGLFDGLVSLQL 102 



ID PGS2 CHICK STANDARD; PRT; 357 AA. 

AC P28675; 

DT 01-DEC-1992 (REL. 24, CREATED) 

DT 01-DEC-1992 (REL. 24, LAST SEQUENCE UPDATE) 

DT 01-OCM996 (REL. 34, LAST ANNOTATION UPDATE) 

DE BONE PROTEOGLYCAN II PRECURSOR (PG-S2) (DECORIN) . 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLI FORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 
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RP SEQUENCE FROM N.A.., AND PARTIAL SEQUENCE, 

RC STRAIN-WHITE LEGHORN; TISSUE-CORNEA; 

RX MEDLINE; 92296755. 

RA LI W., VERGNES J. P., CORNUET P.K., HASSEL J.R.; 

RT "cDNA clone to chick corneal chondroitin/dermatan sulfate 

RT proteoglycan reveals identity to decorin."; 

RL ARCH. BIOCHEM. BIOPHYS. 296:190-197(1992). 

CC ■!- FUNCTION: BINDS TO TYPE I AND TYPE II COLLAGEN AND AFFECTS THE 
CC RATE OF FIBRILS FORMATION, ALSO BINDS TO FIBRONECTIN AND TGF- 
CC BETA. 

CC -!• PTM: THE GLYCOSAMINOGLYCAN CHAIN ATTACHED TO DECORIN CAN BE EITHER 
CC CHONDROITIN SULFATE OR DERMATAN SULFATE DEPENDING UPON THE 
CC TISSUE OF ORIGIN (BY SIMILARITY) . 

CC -!• SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
CC FAMILY . 

CC -!■ SIMILARITY; THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 10. 



fThis SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; X63797; G62888; -. 

DR PIR; S22197; S22197. 

DR PIR; S24317; S24317. 

DR PFAM; PF00560; LRR; 5, 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 



KW 


REPEAT; LEUCINE-R 


EPEAT; 


SIGNAL. 


FT 


SIGNAL 


1 


16 




FT 


PROPEP 


17 


30 




FT 


CHAIN 


31 


357 


BONE PROTEOGLYCAN II. 


FT 


DOMAIN 


75 


306 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


75 


96 


LRR 1. 


FT 


REPEAT 


97 


120 


LRR 2. 


FT 


REPEAT 


121 


143 


LRR 3. 


FT 


REPEAT 


144 


165 


LRR 4. 


FT 


REPEAT 


166 


191 


LRR 5. 


FT 


REPEAT 


192 


215 


LRR 6. 


FT 


REPEAT 


216 


236 


LRR 7. 


FT 


REPEAT 


237 


260 


LRR 8. 


FT 


REPEAT 


261 


283 


LRR 9. 




REPEAT 


284 


306 


LRR 10, 


1 


CARBOHYD 


34 


34 


GLYCOSAMINOGLYCAN (BY SIMILARITY) 




CARBOHYD 


209 


209 


POTENTIAL. 


FT 


CARBOHYD 


260 


260 


POTENTIAL. 


FT 


DISULFID 


52 


65 


BY SIMILARITY. 


FT 


DISULFID 


311 


344 


BY SIMILARITY. 



SQ SEQUENCE 357 AA; 39687 MW; 48F51E32 CRC32; 

Query Match 27.9%; Score 204; DB 1; Length 357; 

Best Local Similarity 33.74; Pred. No. 1.01e-18; 

Matches 32; Conservative 17; Mismatches 46; Indels 0; Gaps 0; 

Db 56 CQCHLRWQCSDLGLERVPKDLPPDTTLLDLQNNKITEIKEGDFKNLKNLHALILVNNKI 115 

I I :hl II :| :|| : |: I I I I I |:| : : :| I 
Qy 3 CTCSNNIVDCRGRGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISRNQI 62 

Db 116 SKISPAAFAPLKKLERLYLSKNNLKELPENMPKSL 150 

I hi II II I II I:: I:: : :l 
Qy 63 SDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGL 97 



Search completed: Fri May 28 08:43:33 1999 
Job time : 22 sees. 
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I**************************************************** 
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I I 
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II II 
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II II 


1 1 1 WW 


II II 
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J 1 1 LI VA 


************** 


**************************** 
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********************** 



t 

Run ( 



Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

ch_pp protein ■ protein database search, using Smith-Waterman algorithm 



Fri May 28 08:43:51 1999; MasPar time 10,05 Seconds 

570.514 Million cell updates/sec 

Tabular output not generated. 

Title: >US-09-191-647-3 

Description: (1-105) from US09191647. pep 

Perfect Score: 730 

Sequence: 1 SPCTCSNNIVDCRGKGLMEI ITE IAKGLFDGLVSLQLLLL 105 

Scoring table; PAM 150 
Gap 11 

Searched: 179066 seqs, 54579741 residues 

Post -processing: Minimum Match 0% 

Listing first 45 summaries 

Database: sptrembl9 

l:sp_archea 2 : sp Jacteria 3:sp_fungi 4:sp_human 
5:sp_invertebrate 6:spjammal 7:sp_mhc 8:sp_organelle 
9:sp_phage 10:sp_plant 11 : sp.rodent 12:sp_unclassif led 
13 :sp_vertebrate 14 : sp_virus 

Statistics: Mean 41.576; Variance 84,632; scale 0.491 

Pred. No. is the number of results predicted by chance to have a 

•score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



% 

Query 



Score 


Match Length DB 


ID 


Description 


Pred. NO. 


698 


95.6 


1523 11 


088280 


MEGF5, 


9.33e-113 


578 


79.2 


1531 11 


088279 


MEGF4 , 


9.68e-89 


228 


31.2 


1496 4 


092626 


MYELOBLAST KIAA0230 (F 


7.27e-22 


221 


30,3 


369 6 


046390 


BIGLYCAN PRECURSOR. 


1.24e-20 


221 


30.3 


688 11 


064183 


FOLLICLE -STIMULATING H 


1.24e-20 


220 


30.1 


372 6 


046403 


BIGLYCAN. 


1.86e-20 


219 


30.0 


259 6 


Q28574 


FOLLITROPIN RECEPTOR P 


2.79e-20 


214 


29.3 


1091 11 


P70193 


MEMBRANE GLYCOPROTEIN. 


2.09e-19 


210 


28.8 


134 6 


Q28573 


FOLLITROPIN RECEPTOR P 


l,04e-18 


210 


28.8 


516 4 


043300 


KIAA0416, 


1.04e-18 


209 


28.6 


313 4 


Q13288 


P37NB. 


1.55e-18 


209 


28.6 


904 4 


015455 


TOLL-LIKE RECEPTOR 3. 


1.55e-lB 


191 


26.2 


360 6 


Q28888 


DECORIN. 


1.90e-15 


191 


26.2 


522 4 


043354 


BAC CLONE GS099H08, CO 


1.90e-15 


190 


26.0 


184 4 


060803. 


DJ63G5.3 (PUTATIVE LEO 


2.816-15 


190 


26.0 


360 6 


046542 


DERMATAN SULFATE PROTE 


2,81e-15 


190 


26.0 


961 5 


P90920 


K07A12.2 PROTEIN. 


2,81e-15 


188 


25.8 


653 5 


002329 


T23G11.6 PROTEIN. 


6.136-15 


188 


25.8 


839 4 


000206 


TOLL PROTEIN HOMOLOG. 


6.136-15 


187 


25,6 


359 4 


015335 


CHONDROADHERIN. 


9.05e-15 



135 6 046377 

322 11 P70186 

322 4 099645 

358 11 055226 

321 6 P79119 

603 11 070211 

358 11 070210 

316 13 Q90944 

680 5 093374 

610 5 021604 

428 4 014498 

1385 5 026388 

353 13 042235 

892 5 P91644 

907 4 075473 

1389 5 Q24591 

331 13 093233 

716 11 Q61809 

1066 5 Q18902 

713 4 075325 

733 5 024250 

352 6 062702 

352 6 028032 

603 5 022075 

784 4 060603 



BIGLYCAN (FRAGMENT). 
PROTEOGLYCAN PRECURSOR 
DERMATAN SULFATE PROTE 



EPIPHYCAN, 

INSULIN-LIKE GROWTH FA 
CHONDROADHERIN, 
PROTEOGLYCAN CORE PROT 
C44H4.3 PROTEIN. 
M88.6 PROTEIN. 
ISLR PRECURSOR. 
TLR-TOLL-LIKE RECEPTOR 
KERATAN SULFATE PROTEO 
KEK2 PRECURSOR (FRAGME 
ORPHAN G PROTEIN-COUPL 
WHEELER. 

PHOSPHOLIPASE A2 INHIB 
LEUCINE-RICH-REPEAT PR 
CODED FOR BY C. ELEGAN 
GLIOMA AMPLIFIED ON CH 
TARTAN PROTEIN PRECDRS 
CORNEAL KERATAN SULFAT 
CORNEAL KERATAN SULFAT 
T01G9.3 PROTEIN. 
T0LL/INTERLEUKIN-1 REC 



PRELIMINARY; 



PRT; 1523 AA. 



SEKI N. , OHARA O.; 
with multiple 



01-NOV-1998 (TREMBLREL. 08, CREATED) 
01-NOV-1998 (TREMBLREL, 08, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
MEGF5. 
MEGF5, 

RATTUS NORVEGICUS (RAT). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

[1] 

SEQUENCE FROM N.A. 

STRAIN-SPRAGUE-DAWLEY; TISSUE=BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M., NAKAJIMA D,, NAGASE T., NOMUF 
"Identification of high-molecular -weight p 
EGF-like motifs by motif-trap screening. n ; 
GENOMICS 51:27-34(1998). 
EMBL; AB0U531; D1033424; -. 
PROSITE; PS01185; CTCK 1; 1. 
PR0SITE; PS01186; EGF 2; 7. 
PROSITE; PS01187; EGF.CA; 2. 
GLYCOPROTEIN; EGF-LIKE DOMAIN, 

1523 AA; 167767 MW; 2BD845D0 CRC32; 



Query Match 95.6%; Score 698; DB 11; Length 1523; 

Best Local Similarity 94.3%; Pred. No. 9.33e-113; 

Matches 99; Conservative 4; Mismatches 2; Indels 0; Gaps C 

Db 282 SACSCSNNIVDCRGKGLTEIPANLPEGIVEIRLEQNSIKSIPAGAFIQYKKLKRIDISKN 341 

1:1:1111111111111 llllllllllllllllllllhllllll lllllllllllll 
Qy 1 SPCTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKN 60 

Db 342 QISDIAPDAFQGLKSLTSLVLYGNKITEIPKGLFDGLVSLQLLLL 386 

IIIIIMMIIIIIIIIIIimillllhlllllllllllllll 

Qy 61 QISDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 2 

ID 088279 PRELIMINARY; PRT; 1531 AA. 
AC 088279; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 



Tue Jun 1 10:16:16 1999 



US-09-191-647-3.rspt 



Page 2 



DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF4 . 

GN MEGF4 . 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS , 

RN (1] 

RP SEQUENCE FROM N.A. 

RC STRAIN»SPRAGUE-DAWLEY; TISSUE=BRAIN; 

RX MEDLINE; 98360089. 

RA NARAYAMA M. , NAKAJIMA D., NAGASE T., NOMURA N. , SEKI N., OHARA 0.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif -trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011530; D1033423; -. 

DR PROSITE; PS01185; CTCK_1; 1. 

DR PROSITE; PS01186; EGF 2; 8. 

DR PROSITE; PS01187; EGF_CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

•ery Match 79.2%; Score 578; DB 11; Length 1531; 

st Local Similarity 71.8*; Pred. No. 9.68e-89; 
Matches 74; Conservative 19; Mismatches 10; Indels 0; Gaps 0; 

Db 286 CSCSNGIVDCRGKGLTAIPANLPETMTEIRLELNGIKSIPPGAFSPYRKLRRIDLSNNQI 345 

hill MINIM Mill : UNI |:||:||:|||: 1 : 1 1 : 1 1 1 : 1 ; 1 1 1 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 346 AEIAPDAFQGLRSLNSLVLYGNKITDLPRGVFGGLYTLQLLLL 388 

::IHIIIIII:I 1 1 1 M 1 1 1 1 1 : : : : I : I II :|||||| 
Qy 63 SDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 3 

ID Q92626 PRELIMINARY; PRT; 1496 AA. 

AC Q92626; 

DT 01-FEB-1997 (TREMBLREL. 02, CREATED) 

DT 01-FEB-1997 (TREMBLREL. 02, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MYELOBLAST KIAA0230 (FRAGMENT) , 

GN KIAA0230. 

OS HOMO SAPIENS (HUMAN) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BONE MARROW; 

• MEDLINE; 97191544. 
NAGASE T., SEKI N,, ISHIKAWA K., OHIRA M., KAWARABAYASI Y., OHARA O., 
TANAKA A., KOTANI H., MIYAJIMA N., NOMURA N.; 

RT "Prediction of the coding sequences of unidentified human genes. VI. 

RT The coding sequences of 80 new genes (KIAA0201-KIAA0280) deduced by 

RT analysis of cDNA clones from cell line KG-1 and brain."; 

RL DNA RES. 3:321-329(1996), 

DR EMBL; D86983; D1013908; -. 

DR PFAM; PF00047; ig; 4. 

DR PFAM; PF00093; vwc; 1. 

DR PFAM; PF00141; peroxidase; 1. 

DR PFAM; PF00560; LRR; 3, 

FT NON.TER 1 1 

SQ SEQUENCE 1496 AA; 167209 MW; 5731EE51 CRC32; 

Query Match 31.2%; Score 228; DB 4; Length 1496; 

Best Local Similarity 37,41; Pred. No. 7.27e-22; 

Matches 40; Conservative 24; Mismatches 38; Indels 5; Gaps 4; 

Db 55 SRCLCFRTTVRCM - HLLLEAVPAVAPQTS ILDLRF • • NRIREIQPGAFRRLRNLNTLLLN 111 

III II 1:1 HI I: :|:::|: I |: I :||| : ::|: : :: 
Qy 1 SPCTCSNNIVDCRGKGLME-IPANLPE-GIVEIRLEQNSIKAIPAGAFTQYKKLKRIDIS 58 

Db 112 NNQIKRIPSGAFEDLENLKYLYLYKNEIQSIDRQAFKGLASLEQLYL 158 

:HI l:J II: I :| I II I I I : I il II: I I 



Y, 1 ■# 



Qy 59 KNQISDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 4 

ID 046390 PRELIMINARY; PRT; 369 AA. 

AC 046390; 

DT 01-JUN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE BIGLYCAN PRECURSOR. 

OS OVIS ARIES (SHEEP). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARTIODACTYLA; RUMINANT IA; PECORA; BOVOIDEA; BOVIDAE; CAPRI NAE; OVIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-CHOROID PLEXUS; 

RA BRUETT L., CLEMENTS J.E.; 

RL SUBMITTED (NOV-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AF034842; G2655356; -. 

KW SIGNAL. 

FT SIGNAL 1 18 POTENTIAL. 

FT CHAIN 38 369 BIGLYCAN. 

SQ SEQUENCE 369 AA; 41523 MW; A0A9F549 CRC32; 

Query Match 30.3%; Score 221; DB 6; Length 369; 

Best Local Similarity 38.6%; Pred. No. 1.24e-2Q; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 151 LVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFEPGAFDGLK 210 

hllMII ::||:|: :| |: :| I |: :::: !::: I : I : I ||:||| 
Qy 17 LMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI-SDIAPDAFQGLK 74 

Db 211 -LNYLRISEAKLTGIPKDLPETLNELHL 237 

II: hi 1:1 I : I hi 
Qy 75 SLTSLVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 5 

ID Q64183 PRELIMINARY; PRT; 688 AA, 

AC Q64183; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FOLLICLE-STIMULATING HORMONE RECEPTOR. 

OS RATTUS NORVEGICUS (RAT) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 92149579. 

RA HECKERT L.L., DALEY I.J., GRISWOLD M.D.; 

RT "Structural organization of the follicle-stimulating hormone receptor 

RT gene."; 

RL MOL. ENDOCRINOL. 6:70-80(1992). 

DR EMBL; S81119; E91753; JOINED. 

DR EMBL; S81121; E91753; JOINED. 

DR EMBL; S81171; E91753; JOINED. 

DR EMBL; S81174; E91753; JOINED, 

DR EMBL; S81198; E91753; -. 

DR EMBL; S81117; E91753; JOINED, 

DR EMBL; S81178; E91753; JOINED. 

DR EMBL; S81185; E91753; JOINED, 

DR EMBL; S8U94; E91753; JOINED. 

DR EMBL; S81183; E91753; JOINED. 

DR PFAM; PF00001; 7tm_l; 1. 

DR PFAM; PF00560; LRR; 3. 

SQ SEQUENCE 688 AA; 77341 MW; E454B260 CRC32; 

Query Match 30.3%; Score 221; DB 11; Length 688; 

Best Local Similarity 32.4%; Pred. No, 1.24e-20; 

Matches 34; Conservative 27; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNRVFLCQDSKVTEIPTDLPRNAIELRFVLTKLRVIPKGSFAGFGDLEKIEISQNDV 82 
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I III : I: : lll::|| :|:|: :: II |:|: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEADVFSNLPKLHEIRIEKANNLLYINPEAFQNLPSLRYLLI 127 

: I :| I I I : : :|:: I |: I II: II: 
Oy 63 SD- IAPDAFOGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 6 

ID 046403 PRELIMINARY; PRT; 372 AA. 

AC 046403; 

DT 0WON-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE BIGLYCAN. 

OS EQUUS CABALLUS (HORSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

A PERISSODACTYLA; EQUIDAE; EQUUS. 

TIP SEQUENCE FROM N.A. 

RA RICHARDSON D.W., DODGE G.R.; 

RL SUBMITTED (NOV-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AF035934; G2662531; -. 

SQ SEQUENCE 372 AA; 41924 MW; 097E7BA6 CRC32; 

Query Match 30.1%; Score 220; DB 6; Length 372; 

Best Local Similarity 38.6*; Pred. No. 1.86e-20; 

Matches 34; Conservative 25; Mismatches 26; Indels 3; Gaps 2; 

Db 154 LVEIPPNLPSSLVELRIHDNRIRKVPKGVFSGLRNMNCIEMGGNPLENSGFQPGAFDGLK 213 

1:111:111 ::||:|: :| |: :| I |: :::: |::: I : I : I ||:||| 
Qy 17 LME I PANLP EG I VEI RLEQNS IKAI PAGAFTQYKKLKRID I SKNQ I "SDIAPDAFQGLK 74 

Db 214 - LNYLRISEAKLTGI PKDLPETLNELHL 240 

I I : |:| |:| I : I hi 
Qy 75 SLTStVLYGNKITEIAKGLFDGLVSLQL 102 



RESULT 7 

ID Q28574 PRELIMINARY; PRT; 259 AA. 

AC Q28574; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FOLLITROPIN RECEPTOR PRECURSOR. 

•OVIS ARIES (SHEEP). 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
ARIIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; CAPRINAE; OVIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-DORSET-LEICESTER-SUFFOLK; TISSUE-TESTIS; 

RX MEDLINE; 93176195. 

RA KHAN H. , YARNEY T.A., SAIRAM M.R.; 

RT "Cloning of alternately spliced mRNA transcripts coding for variants 

RT of ovine testicular follitropin receptor lacking the G protein 

RT coupling domains."; 

RL BIOCHEM. BIOPHYS. RES. COMMUN, 190:888-894(1993). 

DR EMBL; L12767; G484229; •. 

DR PFAM; PF00560; LRR; 3. 

KW SIGNAL. 

FT SIGNAL 1 17 POTENTIAL. 

FT CHAIN 18 59 FOLLITROPIN RECEPTOR, 

SQ SEQUENCE 259 AA; 29095 MW; 7095A385 CRC32; 

Query Match 30.01; Score 219; DB 6; Length 259; 

Best Local Similarity 31,41; Pred. No. 2.79e-20; 

Matches 33; Conservative 28; Mismatches 42; Indels 2; Gaps 2; 

Db 23 CHCSNGVFLCQDSKVTEMPSDLPRDAVELRFVLTKLRVIPEGAFSGFGDLEKIEISQNDV 82 

I III : I: : |:|::|| l|:|: :: II III: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDPDAFQNLPNLRYLLI 127 



: I " I I I : : :|:: I h I :|: lh 
Qy 63 SD • IAPDAFQGLKSLT SLVLY - GNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 8 


ID 


TMTW&PV- DBT • 1 fiQl Ml 


AC 


P70193; 


DT 


01 QQ7 fTBFMBT.Birr, (11 roraTrm 

VI CuD 133f [L&hPlDuKuh, UZ, [^KLnltUj 


DT 


fl1 -FPH-l QQ7 f TUflMBT dipt AO fhOT cpattpmpe nnnnTiu\ 
VI caD 1337 \lt\CMDbt\LL. \lti LAM 3fiyUEim»fc UrUrtl ij J 


DT 


Ul UAN 1333 (IKtMBLKCL. US, JjAol AMUIAUUN UrUAlL) 


DE 


MFMRRANF GLYroPPfYTT'TN 


OS 


MUS MUSCULUS (MOUSE) . 


OC 


ErTKAPYflTA • MFTl?^! 1 CHnPnATA- VPBTTODATA ■ M1MMATT1. rrrrutTDTa. OAncurrTa 


OC 


SCIUROGNATHI; MURIDAE; MURINAE; MUS. 


RN 


[1] 


RP 


SEQUENCE FROM N.A. 


RX 


MEDLINE; 96394313. 


RA 


SUZUKI Y., SATO N,, TOHYAMA M. , WANAKA A., TAKAGI T. ; 


RT 


"cDNA cloning of a novel membrane glycoprotein that is expressed 


RT 


specifically in glial cells in the mouse brain LIG-1: A protein with 


RT 


leucine-rich repeats and immunoglobul in ■ 1 ike domains."; 


RL 


J, BIOL. CHEM, 271:22522-22527(1996). 


DR 


EMBL; D78572; D1012081; -. 


DR 


MGD; MGI: 107935; IMG. 


DR 


PFAM; PF00047; ig; 3. 


DR 


PFAM; PF00560; LRR; 7, 


KW 


MEMBRANE, 



SQ SEQUENCE 1091 AA; 119283 MW; C0F262F9 CRC32; 

Query Match 29.3%; Score 214; DB 11; Length 1091; 

Best Local Similarity 42.5%; Pred. No. 2.09e-19; 

Matches 37; Conservative 19; Mismatches 27; Indels 4; Gaps 3; 

Db 330 AELS-SLSILRLSHNAISHIAEGAFKGLKSLRVLDLDHNEISGTIEDTSGAFTGLDNLSK 388 

1:1: :: :ll :hl |: III I |: :|: |:|| I :: II II :|: 
Qy 22 ANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQISD-I--APDAFQGLKSLTS 78 

Db 389 LTLFGNKIKSVAKRAFSGLESLEHLNL 415 

I MM :|| I || ||: | | 
Qy 79 LVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 9 

ID Q28573 PRELIMINARY; PRT; 134 AA. 

AC Q28573; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FOLLITROPIN RECEPTOR PRECURSOR. 

GN SHPFSHRE. 

OS OVIS ARIES (SHEEP) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARIIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; CAPRINAE; OVIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-DORSET-LEICESTER-SUFFOLK; TISSUE-TESTIS; 

RX MEDLINE; 93176195. 

RA KHAN H., YARNEY T.A,, SAIRAM M.R.; 

RT "Cloning of alternately spliced mRNA transcripts coding for variants 

RT of ovine testicular follitropin receptor lacking the G protein 

RT coupling domains , " ; 

RL BIOCHEM. BIOPHYS, RES. COMMUN. 190:888-894(1993), 

DR EMBL; L12766; G484227; -. 

DR PFAM; PF00560; LRR; 1, 

KW SIGNAL, 

FT SIGNAL 1 17 POTENTIAL. 

FT CHAIN 18 134 POTENTIAL. 

SQ' SEQUENCE 134 AA; 15336 MW; 4D2F9627 CRC32; 

Query Match 28.8*; Score 210; DB 6; Length 134; 

Best Local Similarity 30.84; Pred. No. l.Me-18; 

Matches 32; Conservative 28; Mismatches 42; Indels 2; Gaps 2; 
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Db 23 CHCSNGVFLCQDSKVTEMPSDLPRDAVELRFVLTKLRVIPEGAFSGFGDLEKIBISQNDV 82 

I III = I: : |:|::|| ||:|: :: II III: : I :|:|| |:: 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 83 LEVIEANVFSNLPKLHEIRIEKANNLLYIDPDAFQNLPNLRYLF 126 

: I :: I I I : : :|:: I |: I :|: |: 
Qy 63 SD - IAPDAFQGLKSLTSLVLY -GNKITEIAKGLFDGLVSLQLLL 104 



RESULT 10 

ID 043300 PRELIMINARY; PRT; 516 AA. 

AC 043300; 

DT 01-JUH-1998 (TREMBLREL, 06, CREATED) 

DT 01- JUN-1998 (TREMBLREL, 06, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL, 07, LAST ANNOTATION UPDATE) 

DE KIAAQ416. 

GN KIAA0416, 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

• 11] 
SEQUENCE FROM N,A, 
TISSUE-BRAIN; 

RA ISHIKAWA K,, NAGASE T. , NAKAJIMA D,, SEKI N. , OHIRA M,, MIYAJIMA N, , 

RA TANAKA A., KOTANI H., NOMURA N., OHARA O.; 

RL SUBMITTED (OCM997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AB007876; D1025768; -. 

SQ SEQUENCE 516 AA; 59076 MW; 8EDFDB28 CRC32; 

Query Match 28.8%; Score 210; DB 4; Length 516; 

Best Local Similarity 32.0%; Pred. No. l,04e-18; 

Matches 33; Conservative 22; Mismatches 48; Indels 0; Gaps 0; 

Db 38 CRCEKLLFYCDSQGFHSVPNATDKGSLGLSLRHNHITELERDQFASFSQLTWLHLDHNQI 97 

I I : : I : I: :| I : : I :| I : |: : | :: : ||| 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 98 STVKEDAFQGLYKLKELILSSNKIFYLPNTTFTQLINLQNLDL 140 

I : MINI I ;||| ::: 1 |;;|| | 
Qy 63 SDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 11 . 

ID Q13288 PRELIMINARY; PRT; 313 AA. 

AC Q13288; 

DT 01-NOV-1996 (TREMBLREL, 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

#P37NB. 
HOMO SAPIENS (HUMAN) . 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97136875. 

RA KIM D., LAQUAGLIA M.P., YANG S.Y.; 

RT "A cDNA encoding a putative 37 kDa leucine-rich repeat (LRR) protein, 

RT p37NB, isolated from s-type neuroblastoma cell has a differential 

RT tissue distribution, \ 

RL BIOCHIM, BIOPHYS, ACTA 1309:183-188(1996). 

DR EMBL; U32907; G1236329; -. 

DR PFAM; PF00560; LRR; 3, 

SQ SEQUENCE 313 AA; 36287 MW; AD538A24 CRC32; 

Query Match 28,6%; Score 209; DB 4; Length 313; 

Best Local Similarity 32,9%; Pred. No. 1.55e-18; 

Matches 28; Conservative 27; Mismatches 30; Indels 0; Gaps 0; 

Db 65 LDCQERKLVYVLPGWPQDLLHMLLARNKIRTLKNNMFSKFKKLKSLDLQQNEISKIESEA 124 

:||: : I: : : h :: : I :| |::: |: :|||| :|: |:|| I ::| 
Qy 10 VDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQISDIAPDA 69 



Db 125 FFGLNKLTTLLLQHNQIKVLTEEVF 149 



I II: 11:1:1 I I :: :| 
70 FQGLKSLTSLVLYGNKITEIAKGLF 94 



RESULT 12 

ID 015455 PRELIMINARY; PRT; 904 AA. 

AC 015455; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT Ol-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE TOLL-LIKE RECEPTOR 3. 

GN TLR3, 

OS HOMO SAPIENS (HUMAN) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA ROCK F.L., HARDIMAN G., TIMANS J.C., KASTELEIN R.A., BAZAN J.F.; 

RL SUBMITTED (FEB-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U88879; G2459626; -. 

DR PFAM; PF00560; LRR; 12. 

SQ SEQUENCE 904 AA; 103828 MW; F857CE1C CRC32; 

Query Match 28.6%; Score 209; DB 4; Length 904; 

Best Local Similarity 32.7%; Pred, No, l,55e-18; 

Matches 34; Conservative 26; Mismatches 41; Indels 3; Gaps 3; 

Db 28 CTVSHEVADCSHLKLTQVPDDLPTNITVLNLTHNQLRRLPAANFTRYSQLTSLDVGFNTI 87 

II I::: II .1 ::| :|| I : I :| :: :||: ||:| I :|:: I I 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 88 SKLEPELCQKLPMLKVLNLQHNELSQLSDKTFAFCTNLTELHLM 131 

I : I: II I II I ::::: I : I I |:|: 
Qy 63 SDIAPDAFQGLKSLTSLVLYGNKITEIA-KGL-F-DGLVSLQLL 103 



RESULT 13 

ID Q28888 PRELIMINARY; PRT; 360 AA. 

AC Q28888; Q28608; 

DT Gl-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE DECORIN, 

OS ORYCTOLAGUS CUNICULUS (RABBIT), 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC LAGOMORPHA; LEPORIDAE; ORYCTOLAGUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 95122319. 

RA ZHAN Q,, BURROWS R., CINTRON C; 

RT "Cloning and in situ hybridization of rabbit decorin in corneal 

RT tissues."; 

RL INVEST. OPHTHALMOL. VIS. SCI. 36:206-215(1995), 

RN [2] 

RP SEQUENCE OF 38-358 FROM N.A. 

RC TISSUE=CARTILAGE; 

RA HERING T,M, , KOLLAR J.; 

RL SUBMITTED (NOV-1993) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; S76584; 6913375; -. 

DR EMBL; U03394; G415647; -. 

DR PFAM; PF00560; LRR; 5. 

SQ SEQUENCE 360 AA; 39896 MW; 1FAB2F8E CRC32; 

Query Match 26.2%; Score 191; DB 6; Length 360; 

Best Local Similarity 32,3%; Pred. No. 1.90e-15; 

Matches 32; Conservative 17; Mismatches 50; Indels 0; Gaps 0; 

Db 59 CQCHLRWQCSDLGLDKVPKDLPPDTTLLDLQNNKITEIKDGDFKNLKNLHALILVNNKI 118 

I I :|:| II :l :H : h I I I I J hi : : :| I 
Qy 3 CTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQI 62 

Db 119 SKISPGAFTPLVKLERLYLSKNHLKELPEKMPKSLQELR 157 

I hi II I I I I I : I:: : :| |: 
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Qy 63 SDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQ 101 



RESULT 14 

ID 043354 PRELIMINARY; PRT; 522 AA. 

AC 043354; 

DT 01-JUN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST ANNOTATION UPDATE) 

DE BAC CLONE GS099H08, COMPLETE SEQUENCE. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA ROHLFING T., DAVID M. , AHRENS C; 

RL SUBMITTED (JAN-1998) TO EMBL/GENBANK/DDBJ DATA BANKS. 

A [2] 

SEQUENCE FROM N.A, 

W WATERSTON R,; 

RL SUBMITTED (JAN-1998) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AC004010; G2781386; -. 

SQ SEQUENCE 522 AA; 57933 MW; 0E8FD1F7 CRC32; 

Query Match 26.2%; Score 191; DB 4; Length 522; 

Best Local Similarity 29.2%; Pred. No. 1.90e-15; 

Matches 31; Conservative 27; Mismatches 47; Indels 1; Gaps 1; 

Db 43 TACICATDIVSCTNKNLSKVPGNLFRLIKRLDLSYNRIGLLDSEWIPVSFAKLNTLILRH 102 

::| I: :|| I II :|:|| I : I I I : : : : ||: : : 
Qy 1 SPCTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFT-QYKKLKRIDISK 59 

Db 103 NNITSISTGSFSTTPNLKCLDLSSNKLKTVKNAVFQELKVLEVLLL 148 

I h I: :| :| I I :||: : :::|: I |::||| 
Qy 60 NQISDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLL 105 



RESULT 15 

ID 060803 PRELIMINARY; PRT; 184 AA. 
AC 060803; 

DT 01-AUG-1998 (TREMBLREL. 07, CREATED) 
DT 01-AUG-1998 (TREMBLREL. 07, LAST SEQUENCE UPDATE) 
DT 01-AOG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 
DE DJ63G5.3 (PUTATIVE LEUCINE RICH PROTEIN) (FRAGMENT). 
GN DJ63G5.3. 

•HOMO SAPIENS (HUMAN) . 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 
CATARRHINI; HOMINIDAE; HOMO, 
RN [1] 

RP SEQUENCE FROM N.A. 
RA LLOYD D.; 

RL SUBMITTED (MAY-1998) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; Z94160; E1296586; -. 

FT NONJER 1 1 

FT NONJER 184 184 

SQ SEQUENCE 184 AA; 20630 MW; 7F359903 CRC32; 

Query Match 26.04; Score 190; DB 4; Length 184; 

Best Local Similarity 32.2*; Pred, No, 2.81e-15; 

Matches 28; Conservative 31; Mismatches 26; Indels 2; Gaps 2; 

Db 6 IPQHINSTVHDLRLNENKLKAVLYSSLNRFGNLTDLNLTKNEISYIEDGAFLGQSSLQVL 65 

II : ::||::| :||: ::: :: :| ::::||:|| I II I '| I I 
Qy 20 IPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLKRIDISKNQISDIAPDAFQGLKSLTSL 79 

Db 66 QLGY • NKLSNLTEGMLRGMSRLQFLFV 91 

I I II::::: I:: I: ||:|:: 
Qy 80 VL • YGNK ITE I AKGLFDGLVSLQLLLL 105 



Search completed: Fri May 28 08:44:23 1999 
Job time : 32 sees. 
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t* ****** ******************* 



Release 3.1A John f. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

;rch_pp protein - protein database search, using Smith-Waterman algorithm 



Fri May 28 08:46:09 1999; MasPar time 7.71 Seconds 

380.559 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description : 
Perfect Score: 
Sequence: 

Scoring table: 



XJS-09-191-647-4 

(1-138) from US09191647 .pep 

995 

1 EGAFNGAASVQELMLTGNQL CQKPFFLKEIPIQGVGHPGI 138 

PAM 150 
Gap 11 

170751 seqs, 21266608 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 



1: parti 2:part2 3:part3 4:part4 5:part5 6: parte 7:part7 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13:partl3 
14 :partl4 15:partl5 16:partl6 17:partl7 18:partl8 
19:partl9 20:part20 21:part21 22:part22 23:part23 
24:part24 25:part25 26:part26 27 :part27 28:part28 
29:part29 30:part30 31:part31 32:part32 33:part33 
34:part34 35:part35 36:part36 37:part37 38:part38 
39:part39 

^tistics: Mean 30.450; Variance 125.613; scale 0.242 

Pred. No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



Resull 

NO 



Query 



SUMMARIES 



Score 


Match Length 


DB 


ID 


Description 


Pred. 


No. 


678 


68.1 


1534 


30 


W46966 


Amino acid sequence o 


6.89e 


56 


322 


32.4 


1480 


5 


R25079 


Drosophila SLIT prote 


1.88e 


20 


246 


24,7 


196 


5 


R29102 


Drosophila SLIT prote 


3.47e 


13 


217 


21.8 


560 


12 


R71294 


Human glycoprotein V. 


1.79e 


10 


208 


20.9 


1091 


27 


W41641 


Sequence used in dete 


1.22e 


09 


184 


18.5 


605 


17 


R85888 


WD-40 domain-contg. i 


1.90e 


07 


179 


18.0 


603 


17 


R85889 


WD-40 domain-contg. r 


5.39e 


07 


175 


17.6 


245 


28 


W39170 


Human PKD1 protein fr 


1.23e 


06 


175 


17.6 


455 


28 


W39171 


Human PKD1 protein fr 


1.23e 


06 


175 


17.6 


4302 


29 


W33396 


Human PKD1 polypeptid 


1.23e 


06 


175 


17,6 


4302 


28 


W23830 


Human PKDl protein. 


1.23e 


06 


175 


17.6 


4302 


19 


W00870 


Polycystic kidney dis 


1.23e 


06 


175 


17.6 


4339 


15 


R75916 


Polycystic kidney dis 


1.23e 


06 


175 


17.6 


4339 


19 


R87539 


Polycystic kidney dis 


1.23e 


06 


167 


16.8 


4303 


17 


R90302 


Polycystic kidney dis 


6.42e 


06 


165 


16.6 


345 


23 


W09405 


Pineal gland specific 


9.67e 


06 



17 


155 


15.6 


784 39 


W86350 


Human DNAX toll -like 


7.42e 


05 


18 


155 


15.6 


784 39 


W90069 


Human TNF-alpha conve 


7.42e 


05 


19 


155 


15.6 


784 29 


W48245 


Human pro-tumour necr 


7.42e 


05 


20 


154 


15.5 


661 28 


W47274 


Human B-cell activati 


9.08e 


05 


21 


151 


15.2 


661 39 


W87556 


B cell surface protei 


1.66e 


04 


22 


151 


15.2 


661 25 


W28510 


Product of clone J422 


1.66e 


04 


23 


140 


14,1 


1025 18 


W03185 


Rice Xa21 disease res 


1.50e 


03 


24 


138 


13.9 


293 1 


P91368 


45 kDa amino terminal 


2.23e 


03 


25 


137 


13.8 


365 39 


W86353 


Partial human DNAX to 


2.72e 


03 


26 


136 


13.7 


1112 16 


R85299 


Tomato pathogen resis 


3.31e 


03 


27 


136 


13.7 


1112 16 


R85298 


Tomato pathogen resis 


3.31e 


03 


28 


135 


13.6 


139 8 


R42263 


Decor in sequence PT-7 


4.03e 


03 


29 


135 


13.6 


186 8 


R42264 


Decor in' sequence PT-7 


4.03e 


03 


30 


135 


13.6 


234 8 


R42265 


Decor in sequence PT-7 


4.03e 


03 


31 


135 


13.6 


280 8 


R42266 


Decor in sequence PT-7 


4.03e 


03 


32 


135 


13.6 


305 8 


R42267 


Decor in sequence PT-7 


4.03e 


03 


33 


135 


13.6 


331 8 


R42260 


Mature decor in PT-65. 


4.03e 


03 


34 


135 


13.6 


342 17 


R89439 


Human recombinant dec 


4.03e 


03 


35 


135 


13,6 


353 1 


R05160 


. Sequence of human bon 


4.03e 


03 


36 


135 


13,6 


1388 18 


R89471 


Collagen/decor in fusi 


4.03e 


03 


37 


132 


13.3 


904 39 


W86351 


Human DNAX toll-like 


7.26e 


03 


38 


128 


12.9 


644 39 


W82318 


Human 7 -transmembrane 


l.SBe 


02 


39 


128 


12.9 


799 39 


W86352 


Human DNAX toll -like 


1.58e 


02 


40 


128 


12.9 


806 20 


W09254 


Tomato pathogen resis 


1.58e 


02 


41 


128 


12.9 


806 16 


R85301 


Tomato pathogen resis 


1.58e 


02 


42 


128 


12.9 


837 39 


W86361 


Human DNAX toll-like 


1.58e 


02 


43 


128 


12.9 


863 14 


R75919 


Tomato Cf-9. 


1.58e 


02 


44 


127 


12.8 


332 15 


R87953 


Bovine neurotrophic b 


1.92e 


02 


45 


125 


12.6 


610 23 


W18201 


Platelet glycoprotein 


2.83e 


02 



RESULT 
ID 
AC 
DT 



1 



Key 

Peptide 



Protein 



W46966 standard; Protein; 1534 AA. 
W46966; 

06- JUL-1998 (first entry) 

Amino acid sequence of a human slit-like polypeptide. 

Slit-like protein; human; diagnosis; treatment; brain -specific disease; 

cancer; antibody. 

Homo sapiens, 

Location/Qualifiers 
1..26 

/note° "signal peptide" 
27.. 1534 
/note- "mature protein" 

J10087699-A. 

07- APR-1998. 

15- JUL-1997; 205351. 

16- JUL-1996; JP-186219. 

(AS AH ) ASAHI KASEI KOGYO KK. 
WPI; 98-267127/24. 
N-PSDB; V16978. 

Human Slit-like protein - useful for diagnosis and treatment of 
brain -specific diseases and cancers 
Disclosure; Pages 31-35; 45pp; Japanese, 
The present sequence represents a novel human slit-like protein (the 
mature protein is claimed in Claim 1). The slit-like polypeptide is 
useful for diagnosis and treatment of brain-specific diseases and 
cancers, Antibodies directed against the protein, or its fragments 
can also be used for diagnosing cancer. 
1534 AA; 



Query Match 68,1%; Score 678; DB 30; Length 1534; 

Best Local Similarity 66.7%; Pred. No. 6. 89e- 56; 

Matches 92; Conservative 28; Mismatches 16; Indels 2; Gaps 2; 

Db 582 dgafegaasvselhltanqlesirsgmfrg-ldglrtlmlrnnriscihndsftglrnvr 640 

:|!!:||IH II M : M ; I : : : : Ml | ||:|||||:| |:|: ||:|:|| : 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 641 llslydnqittvspgafdtlqslstlnllanpfncncql-awlggwlrkrkivtgnprcq 699 

IIIMII:II|::|||| II 1 1 1 1 :l 1 1 : 1 1 1 1 1 1 1 : 1 I II 1 1 1 1 1 : 1 1 : 1 1 1 1 1 1 
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Qy 61 LLSLYDNR ITT I TPGAFTTLVSLST I NLLSNPFNCNCHLGAGLGKWLRKRRI VSGNPRCQ 120 

Db 700 npdflrqiplqdvafpdf 717 

:| ll::||:| I: I : 
Qy 121 KPFFLKEIPIQGVGHPGI 138 



RESULT 2 

ID R25079 standard; Protein; 1480 AA, 

AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss. 

OS Drosophila' melanogaster. 

FH Key Location/Qualifiers 

FT peptide 1..36 

FT /label- signal 

FT domain 73.. 294 

• /label- Flank_LRR_Flank_l 
/note- "mediates adhesive events" 
domain 295.. 518 

FT /label- Flank-LRR-Flank_2 

FT /note- "mediates adhesive events" 

FT domain 519.. 714 

FT /label- Flank_LRR_Flank_3 

FT /note- "mediates adhesive events" 

FT domain 715.. 910 

FT /label- Flank_LRR_Flank_4 

FT /note- "mediates adhesive events" 

FT region 911.. 1150 

FT /label- Tandem_EGF_like_repeats 

FT /note- "involved in protein-protein interactions" 

FT region 1353.. 1393 

FT /label- 7th_EGF_like_repeat 

FT /note- "involved in receptor-ligand interactions" 

FT region 1394.. 1404 

FT /label- alternative_splice_segment 

FT /note- "developmental^ regulated" 

FT region 1405.. 1480 

ft /label- C-terminal region 

PN WO9210518-A. 

PD 25-JON-1992. 

PF 27-NOV-1991; U09055. 

PR 07-DEC-1990; OS-624135. 

PA (tJYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Rothberg JM; 

♦WPI; 92-234590/28. 
N-PSDB; Q25811. 
SLIT protein and sequence elements for treating 
neuro-degenerative disease - useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 1; Page 84-89; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways. The process 

CC' is dependent on the level of SLIT protein expression. It appears 

CC that slit protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding. slit can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes-caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 



CC claimed as are molecules comprising at least 1 FLank-LRR -Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 

Query Match 32.4%; Score 322; DB 5; Length 1480; 

Best Local Similarity 36.7%; Pred. No. 1.88e-20; 

Matches 44; Conservative 36; Mismatches 38; Indels 2; Gaps 2; 

Db 365 alsglkqlttlvlygnkikdlpsgvfkg-lgslrllllnaneiscirkdafrdlhslsll 423 

: hi II : : : hi |::|: hi :l hh :hl I h II 
Qy 3 AFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLL 62 

Db 424 slydnniqsvangtfdamksmktvhlaknpficdcnlrw-vadylhknpietsgarcesp 482 

Mill I ::: hi :: h h = l lll'hhl :: hi I :: :lh I 
Qy 63 SLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



RESULT 3 

ID R29102 standard; Protein; 196 AA. 

AC R29102; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein Flank-LRR-Flank consensus sequence. 

KW Neurogenesis; EGF-like 'repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; midline glial cells; 

KW axonogenesis; cell-cell interaction; ss. 

OS Drosophila melanogaster. 

FH Key Location/Qualifiers 

FT region 1..32 

FT /label- amino_flanking_region 

FT region 3 3.. 13 5 

FT /label- Leucine_rich_repeat_region 

FT region 136.. 196 

FT ■ /label- carboxy flanking region 

PN WO9210518-A. 

PD 25-JDN-1992. 

PF 27-NOV-1991; U09055. 

PR 07-DEC-1990; OS-624135. 

PA (OYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Rothberg JM; 

DR WPI; 92-234590/28. 

PT SLIT protein and sequence elements for treating 

PT neuro-degenerative disease - useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 5; Page 95; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the . 

CC concomitant formation of the commisural axon pathways. The process 

CC is dependent on the level of SLIT protein expression, It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 
CC . axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart, The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding. SLIT contains 4 Flank-LRR-Flank domains (see R25079) 

CC which mediate adhesive events and this consensus sequence is based 

CC on them. Each of the 4 individual domain sequences is individually 

CC claimed. See also Q25811. 

SQ Sequence 196 AA; 

Query Match 24.7%; Score 246; DB 5; Length 196; 

Best Local Similarity 23.0%; Pred. No. 3.47e-13; 

Matches 31; Conservative 12; Mismatches 90; Indels 2; Gaps 2 

Db 48 fxxlxxlxxlxlxxnxixxlxxxxfxx-lxxlxxlilxxnxixxlxxxxlxxlxxlxxlx 106 

I : I I I : : II : I ; I 

Qy 4 FNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLS 63 

Db 107 lxxnxixxlxxxxfxxlxxlxxlxlxxnpfxcdcxlxw-lxxxxxxxxxxxxxxrcxxpx 165 

III: I I I : I III hi I I III 

Qy 64 LYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKPF 123 
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Db 165 xxxxxxixxlxxxxf 180 
Qy 124 FLKEI PIQGVGHPGI 138 



RESULT 4 




ID 


R71294 standard; Protein; 560 M, 


AC 


R71294; 




DT 


18-AUG-1995 (first entry) 


DE 


Human glycoprotein V. 


KW 


Glycoprotein V; 


GPV; platelet. 


OS 


Homo sapiens. 




FH 


Key 


Location/Qualifiers 


FT 


peptide 


1. .16 


FT 




•/label- Sig_peptide 




modified_site 


51 






/label- N-glycosylation site 




modified_site 


181 






/label- N-glycosylation site 


FT 


modified_site 


244 


FT 




/label B N-glycosylation site 


FT 


modified_site 


267 


FT 




/note- "N-glycosylation_site n 


FT 


modified_site 


298 


FT 




/label- N-glycosylation_site 


FT 


modified_site 


312 


FT 




/label- N-glycosylation site 


FT 


modifiedjite 


385 


FT 




/label- N-glycosylation site 


FT 


cleavage_site 


476. .477 


FT 




/note- "putative thrombin cleavage site" 


FT 


modifiedjite 


499 


FT 




/label" N-glycosylationjite 


FT 


domain 


520.. 544 


FT 




/note- "putative transmembrane domain" 


PN 


WO9502054-A. 


PD 


19-JAN-1995. 




PF 


07-JOL-1994; U07644. 


PR 


09-JUL-1993; OS-089455. 


PR 


03-DEC-1993; US-162599. 


PR 


10-FEB-1994; US 


-195006. 


PA 


(CORT-) COR THERAPEUTICS INC, 


PI 


Cazenave J, Lanza F, Phillips DR; 


DR 


WPI; 95-066899/09. 




N-PSDB; Q85594. 




I 


Platelet glycoprotein V gene ■ useful for producing glycoprotein 




V (GPV) and variants and generating antibodies to GPV 


PS 


Disclosure; Pag 


e 45-50; 82pp; English. 


cc 


Genomic clones were isolated from a human fibroblast library in 


cc 


lambda Fix using a 748 bp 32P-labeled glycoprotein V (GPV) cDNA 


cc 


probe. Exon-containing fragments from positive clones were 


cc 


subcloned and sequenced, The full sequence of the human GPV gen? 


cc 


is given in Q85594 , 


SQ 


Sequence 560 AA; 



Query Match 21.8%; Score 217; DB 12; Length 560; 

Best Local Similarity 34.4*; Pred. No. 1.79e-10; 

Matches 43; Conservative 28; Mismatches 49; Indels 5; Gaps 4; 

Db 332 qgafqglgelqvlalhsngltalpdgllrg-lgklrqvslrrnrlralpralfrnlssle 390 

:IH I : :! I I :l I :: :|| |: |: : II I : :: I III: 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 391 svqldhnqletlpgdvfgalprltevllghnswrcdcglgpflg-wlrqhlglvggeepp 449 
: I I :| I: : I I: |:| ||: II III : :|:|: I 

61 LLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRR-IVSGN-P 117 



Qy 

Db 450 rcagp 454 

II I 

Qy 118 RCQKP 122 



RESULT 5 

ID W41641 standard; Protein; 1091 AA, 

AC W41641; 

DT 27-APR-1998 (first entry) 

DE Sequence used in detection method. 

KW Detection; mouse; murine. 

OS Mus sp. 

PN J09107971-A. 

PD 28-APR-1997. 

PF 19-OCT-1995; 270822. 

PR 19-OCT-1995; JP-270822. 

PA (TANA ) TANABE SEIYAKU CO, 

PA (TOYA/) TOYAMA S. 

DR WPI; 97-292464/27. 

DR N-PSDB; V04445. 

PT Detection of genes, useful for cloning genes of high and low 

PT expression - by homogenising prepared ds-cDNA pool from sample for 

PT comparison with each other to remove specific DNA fragment 

PS Claim 9; Pages 13-16; 18pp; Japanese. 

CC The present sequence was used in the development of a novel method 

CC for detecting genes showing difference in expression quantity 

CC between several samples. The method comprises preparing a double 

CC stranded cDNA pool of a standard organism sample, and homogenising 

CC the contents of each DNA fragment contained in the resultant cDNA 

CC pool to prepare a content homogenised standard cdna pool. Double 

CC stranded cDNA pools derived from each sample for several organism 

CC samples are compared with each other, and the DNA fragment 

CC associated with the DNA fragment in the cDNA pool from the content 

CC' homogenised standard cDNA pool is removed to give a remaining cDNA 

CC pool for each of the samples to be compared. The method can clone 

CC gene groups of high and low expression level efficiently. 

SQ Sequence 1091 AA; 

Query Match 20.9*; Score 208; DB 27; Length 1091; 

Best Local Similarity 33. U; Pred. No. 1.22e-09; 



MM: |:: | I |:: |: I || I M :| I I:: MM I 



Matches 


Db 


350 


Qy 


1 


Db 


410 


Qy 


59 


Db 


469 


Qy 


118 



I I II :| M 



:| I II I : : 



RESULT 6 

ID R85888 standard; Protein; 605 AA. 

AC R85888; 

DT 13-SEP-1996 (first entry) 

DE WD-40 domain -contg. insulin-like growth factor binding protein. 

KW WD40 repeat region; beta-transducin; protein -protein interaction; drug; 

KW intracellular signalling; protein kinase C; homology; motif; modulator; 

KW receptors of activated protein kinase; enzyme activity; isozyme; human. 

OS Synthetic. 

PN W09521252-A2. 

PD 10-AUG-1995. 

PF 31-JAN-1995; 001210. 

PR 01-FEB-1994; US-190802. 

PA (STRD ) UNIV LELAND STANFORD JUNIOR. 

PI Mochly-Rosen D, Ron D; 

DR WPI; 95-283772/37. 

PT New WD-40 (beta-transducin) -derived polypeptide^ ) - which alter the 

PT activity of a protein, eg. protein kinase C, which interacts with a 

PT protein contg. a WD-40 region. 

PS Example 5; Page 122-125; 351pp; English. 

CC Proteins R85851-92 are protein which contain at least one WD-40 (also 

CC called beta-transducing homologous) amino acid repeat motifs. The WD-40 

CC regions are involved in protein-protein interactions between proteins 
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CC involved in intracellular signalling. An example of such an interaction 

CC is between protein kinase C and receptors of activated protein kinase 

CC (RACK), esp. RACK-1 (R85850). Proteins R85851-82 were isolated based on 

CC homology with beta-transducin, whereas proteins R85882-92 were isolated 

CC based on homology with the WD-40 consensus sequence (R85893). The 

CC proteins were used to construct the peptides R84928-R85063 and 

CC R85786-R85842 . The peptides can be used to identify target proteins 

CC contg. WD-40 motifs, as modulators of enzyme esp. isozyme, activity of 

CC proteins involved in protein-protein interaction and to screen for drugs 

CC that will affect protein -protein interaction involving WD-40 domains. 

SQ Sequence 605 AA; 

Query Match 18.5%; Score 184; DB 17; Length 605; 

Best Local Similarity 34.84; Pred. No. 1.90e-07; 

Matches 31; Conservative 27; Mismatches 28; Indels 3; Gaps 3; 

Db 187 daafrglgslrelvlagnrlaylqpalf-sglaelreldlsrnalra-ikanvfvqlprl 244 

"II I :|::ll:|:lhl :: I :||: |: I I |: I : : : I |: : 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRG FRGGLSGLKTLML • RSNLIGCVSNDTFAGLSSV 59 




245 qklyldrnliaavapgaflglkalrwldl 273 



RESULT 7 

ID R85889 standard; Protein; 603 AA. 

AC R85889' 

DT 13-SEP-1996 (first entry) 

DE WD-40 domain -contg. rat insulin-like growth factor binding protein. 

KW WD40 repeat region; beta-transducin; protein -protein interaction; drug; 

KW intracellular signalling; protein kinase C; homology; motif; modulator; 

KW receptors of activated protein kinase; enzyme activity; isozyme; human. 

OS Rattus rattus. 

PN W09521252-A2, 

PD 10-AOG-1995. 

PF 31-JAN-1995; O01210. 

PR 01-FEB-1994; US-190802. 

PA (STRD ) UNIV LELAND STANFORD JUNIOR. 

PI Mochly- Rosen D, Ron D; 

DR WPI; 95-283772/37. 

PT New WD-40 (beta-transducin) -derived polypeptide (s) - which alter the 

PT activity of a protein, eg. protein kinase C, which interacts with a 

PT protein contg. a WD-40 region. 

PS Example 5; Page 125-128; 351pp; English. 

CC Proteins R85851-92 are protein which contain at least one WD-40 (also 

CC called beta-transducing homologous) amino acid repeat motifs. The WD-40 

CC regions are involved in protein -protein interactions between proteins 

^ involved in intracellular signalling. An example of such an interaction 
is between protein kinase C and receptors of activated protein kinase 

Wf (RACK), esp, RACK-1 (R85850). Proteins R85851-82 were isolated based on 

CC homology with beta-transducin, whereas proteins R85882-92 were isolated 

CC based on homology with the WD-40 consensus sequence (R85893), The 

CC proteins were used to construct the peptides R84928-R85063 and 

CC R85786-R85842, The peptides can be used to identify target proteins 

CC contg. WD-40 motifs, as modulators of enzyme esp. isozyme, activity of 

CC proteins involved in protein-protein interaction and to screen for drugs 

CC that will affect protein-protein interaction involving WD-40 domains. 

SQ Sequence 603 AA; 

Query Match 18.0%; Score 179; DB 17; Length 603; 

Best Local Similarity 37.5%; Pred. No. 5.39e-07; 

Matches 39; Conservative 22; Mismatches 41; Indels 2; Gaps 2; 

Db 356 gafsglfnvavmnlsgnclrslpervfqg-ldklhslhlehsclghvrlhtfaglsglrr 414 

MM :| : Ml I :: I M I MM : :| I MMM 
Qy 2 GAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRL 61 

Db 415 lflrdnsissieeqslaglselleldlttnrlthlprqlfqglg 458 

I I II I::| ::: I I ::| :| : :| III 
Qy 62 LSLYDNRITTITPGAFTTLVSLSTINLLSNPFNC-NCHLGAGLG 104 



RESULT 8 

ID W39170 standard; Protein; 245 AA. 

AC W39170; 

DT 08-MAY-1998 (first entry) 

DE Human PKDl protein fragment 1. 

KW PKDl; polycystic polycystic renal degeneration; diagnosis; epitope; 

KW therapy. 

OS Homo sapiens, 

PN DE19650758-C1. 

PD 02-JAN-1998, 

PF 06-DEC-1996; 050758. 

PR 06-DEC-1996; DE-050758. 

PA (DEKR-) DEUT KREBSFORSCHUNGSZENTRUM. 

PI Martens R, Velhagen I, Zentgraf H; 

DR WPI; 98-034057/04. 

PT PKDl protein fragments, DNA and antibodies - useful for diagnosis 

PT and therapy of polycystic renal degeneration 

PS Claim 3; Page •; 5pp; German. 

CC W39170-W39184 are fragments of the human polycyclin protein, PKDl, 

CC These fragments contain epitopes recognised by PKDl-specific 

CC antibodies and can be used in the detection and diagnosis of the 

CC autosomal dominant condition, polycystic renal degeneration. W39170 

CC corresponds to amino acids 26-270 of the mature PKDl protein. 

CC Note: This fragment does not appear in the specification but has 

CC been generated from the sequence represented in W23830. 

SQ Sequence 245 AA; 

Query Match 17,6%; Score 175; DB 28; Length 245; 

Best Local Similarity 32.2%; Pred. No. l,23e-06; 

Matches 39; Conservative 29; Mismatches 47; Indels 6; Gaps 6; 

Db 27 sgrglrtl-gpalrip-adataldvshnllraldvgllanlsalaeldisnnkistleeg 84 

:| I I: I ::| : :| : lh : :| II:: I : :|:|:|: I 
Qy 16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

Db 85 ifanlfnlseinlsgnpfecdcgl-awlprwaeeqqvrvvqpeaatcagpgslagqpllg 143 

I: I :ll III :|||:|:| II I :| : |:| : : ( I I |: I 
Qy 76 AFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLR-KR-RIV-SGNPRCQKPFFLKEIPIQG 132 

Db 144 i 144 

Qy 133 V 133 



RESULT 9 

ID W39171 standard; Protein; 455 AA, 

AC W39171; 

DT 08-MAY-1998 (first entry) 

DE Human PKDl protein fragment 2 , 

KW PKDl; polycystic polycystic renal degeneration; diagnosis; epitope; 

KW' therapy. 

OS Homo sapiens. 

PN DE19650758-C1, 

PD 02-JAN-1998. 

PF 06-DEC-1996; 050758. 

PR 06-DEC-1996; DE-050758. 

PA (DEKR-) DEUT KREBSFORSCHUNGSZENTRUM . 

PI Martens R, Velhagen I, Zentgraf H; 

DR WPI; 98-034057/04. 

PT PKDl protein fragments, DNA and antibodies • useful for diagnosis 

PT and therapy of polycystic renal degeneration 
PS ■ Claim 3; Page -; 5pp; German. 

CC W39170-W39184 are fragments of the human polycyclin protein, PKDl. 

CC These fragments contain epitopes recognised by PKDl-specific 

CC antibodies and can be used in the detection and diagnosis of the 

CC autosomal dominant condition, polycystic renal degeneration. W39171 

CC corresponds to amino acids 26-480 in the mature PKDl protein. 

CC Note: This fragment does not appear in the specification but has 

CC been generated from the sequence represented in W23830. 

SQ Sequence 455 AA; 

Query Match 17.6%; Score 175; DB 28; Length 455; 

Best Local Similarity 32.2%; Pred, No. 1.23e-06; 
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Matches 39; Conservative 29; Mismatches 47; Indels 6; Gaps 6; 

Db 27 sgrglrtl-gpalrip-adataldvshnllraldvgllanlsalaeldisnnkistleeg 84 

:| 11:1 ::| : :| : ||: : :| II:: I : :|:|:|: I 
Qy 16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

Db 85 ifanlfnlseinlsgnpfecdcgl-awlprwaeeqqvrvvqpeaatcagpgslagqpllg 143 

I: I HI III :HI:|:| I I I :| : |:| : : I I. I hi 
Qy 76 AFITLVSLSTINLLSNPFNCNCHLGAGLGKWLR-KR-RIV-SGNPRCQKPFFLKEIPIQG 132 

Db 144 i 144 

Qy 133 V 133 



RESULT 10 

#W33396 standard; Protein; 4302 AA. 
W33396; 
Ol-JUN-1998 (first entry) 
Human PKDl polypeptide, 
KW Human; polycystic kidney disease 1; PKDl; treatment; 
KW autosomal dominant polycystic kidney disease; APKD. 
OS Homo sapiens. 
PN W09744457-A1. 
PD 27-NOV-1997. 
PF 22-MAY-1997; U08799. 
PR 03-JUN-1996; US-658136. 
PR 24-MAY-1996; OS-655360. 
PA (GENZ ) GENZYME CORP. 

PI Burn T, Connors T, Dackowski w, Germino G, Klinger K, 

PI Qian F; 

DR WPI; 98-018511/02. 

DR N-PSDB; T94012. 

PT Human polycystic kidney disease gene, PKDl ■ useful to treat and 
PT diagnose human autosomal or adult onset polycystic kidney disease 
PS Claim 8; Pages 119-138; 257pp; English. 
CC The present sequence is the human polycystic kidney disease 1 
CC (PKDl) polypeptide. The PKDl cDNA or polypeptide may be used to 
CC treat autosomal dominant polycystic kidney disease (APKD), and 
CC identify carriers of mutant PKDl genes, i.e. subjects susceptible 
CC to APKD. Antibodies (Ab) that distinguish between normal and mutant 
CC PKDl sequences can also be used in diagnostic tests. Anti-PKDl Ab 
CC may also be used to perform subcellular and histochemical 
CC localisation studies, and to block the function of PKDl, Ab are 
CC also useful in rational drug design studies to identify and test 

•inhibitors of PKDl, Sense and antisense sequences derived from the 
PKDl gene may used for detection and therapy, 
Sequence 4302 AA; 

Query Match 17.6%; Score 175; DB 29; Length 4302; 

Best Local Similarity 32.24; Pred, No, 1.23e-06; 

Matches 39; Conservative 29; Mismatches 47; Indels 6; Gaps 6; 

Db 52 sgrglrtl-gpalrip-adataldvshnllraldvgllanlsalaeldisnnkistleeg 109 

:| I I: I :M : H : lh : :| II" I : :|:|:|: I 
Qy 16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

Db 110 ifanlfnlseinlsgnpfecdcgl-awlprwaeeqqvrvvqpeaatcagpgslagqpllg 168 

: I :N III :lll:|:| I I I :| : |:| : : I I I |: I 
Qy 76 AFTTLVSLST1NLLSNPFNCNCHLGAGLGKWLR-KR-RIV-SGNPRCQKPFFLKEIPIQG 132 

Db 169 i 169 

Qy 133 V 133 



RESULT 11 

ID W23830 standard; Protein; 4302 AA. 

AC W23830; 

DT 08-MAY-1998 (first entry) 

DE Human PKDl protein. 

KW PKDl; polycystic polycystic renal degeneration; diagnosis; epitope; 

KW therapy. 



OS Homo sapiens. 



FH Key Location/Qualifiers ■ 

FT Region 26.. 270 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 26.. 480 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 361.. 540 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 480,. 700 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 541.. 840 

ft /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 700,. 1100 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 1011.. 1220 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 2161.. 2370 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 2723.. 2931 

FT /note- "contains epitope recognised by a 

ft PKDl-specific antibody" 

FT Region 2850.. 3000 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 2932.. 3067 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 3100.. 3280 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 3200.. 3400 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 3311.. 3603 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 

FT Region 4090.. 4302 

FT /note- "contains epitope recognised by a 

FT PKDl-specific antibody" 



PN DE19650758-C1. 

PD 02-JAN-1998, 

PF 06-DEC-1996; 050758. 

PR 06-DEC-1996; DE-050758. 

PA (DEKR-) DEUT KREBSFORSCHUNGSZENTRtl. 

PI Martens R, Velhagen I, Zentgraf H; 

DR WPI; 98-034057/04. 

PT PKDl protein fragments, DNA and antibodies • useful for diagnosis 

PT and therapy of polycystic renal degeneration 

PS Disclosure; Page -; 5pp; German, 

CC This sequence represents a human polycyclin, PKDl. This protein is used 

CC to generate fragments (see W39170-W39184) which contain epitopes 

CC recognised by PKDl-specific antibodies and can be used in the detection 

CC and diagnosis of the autosomal dominant condition, polycystic renal 

CC degeneration. 

SQ Sequence 4302 AA; 

Query Match 17,6%; Score 175; DB 28; Length 4302; 

Best Local Similarity 32.2%; Pred. No, 1.23e-06; 

Matches 39; Conservative 29; Mismatches 47; Indels 6; Gaps 6; 

Db 52 sgrglrtl-gpalrip-adataldvshnllraldvgllanlsalaeldisnnkistleeg 109 

:| I I: I ::| : :| : ||: : :| ||:: I : :|:|:|: I 
Qy' 16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

Db 110 ifanlfnlseinlsgnpfecdcgl-awlprwaeeqqvrvvqpeaatcagpgslagqpllg 168 
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Qy 



I: I :ll ill : 1 1 1 : 1 : 1 MM : |:| : : I I | |: | 
76 AFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLR-KR-RIV-SGNPRCQKPFFLKEIPIQG 132 



Db 169 1 169 
Qy 133 V 133 



RESULT 12 

ID W00870 standard; Protein; 4302 AA. 
AC W00870; 

DT 02-FEB-1997 (first entry) 
DE Polycystic kidney disease 1 (PKD1) polypeptide. 

Adult polycystic kidney disease; APKD; PKD1 gene; diagnosis; 
therapy; polycystic 
Homo sapiens. 



FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 



Key Location/Qualifiers 
peptide 1..23 

/label- Sig_peptide 
protein 24.. 4303 

/label- Mat_protein 
region 72.. 125 

/label- LRR 

/note= "leucine-rich repeat region" 
domain 2580,. 2600 

/label- TM1 

/note- "transmembrane domain 1" 
domain 2693., 2713 

/label- TM2 

/note- "transmembrane domain 2" 
domain 3075.. 3095 

/label= TM3 

/note- "transmembrane domain 3" 
domain 3281., 3301 

/label- TM4 

/note- "transmembrane domain 4" 
domain 3323.. 3343 

/label- TM5 

/note- "transmembrane domain 5" 
domain 3559.. 3579 

/label- TM6 

/note- "transmembrane domain 6" 
domain 3582,, 3612 

/label- TM7 

/note- "transmembrane domain 7" 
domain 3669., 3689 

/label- TM8 

/note- "transmembrane domain 8" 
misc.difference 50 

/label- N-glycosylation_site 
misc.difference 89 

/label- N-glycosylation_site 
misc.difference 107 

/label- N-glycosylation_slte 
misc.difference 112 

/label- N-glycosylation_site 
misc.difference 187 

/label- N-glycosylation_site 
misc.difference 621 

/label- N-glycosylation_site 
miscjifference 632 

/label- N-glycosylation_site 
misc.difference 746 

/label- N-glycosylation_site 
misc.difference 810 

/label- N-glycosylation_site 
misc.difference 841 

/label- N-glycosylation.site 
misc.difference 854 

/label- N-glycosylation_site 
misc.difference 890 

/label- N-glycosylation_site 
misc.difference 921 



FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 

FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
'FT 

FT misc.di 
FT 

FT misc di 

FT. . 

ft misc.di 

FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 

FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

' FT misc.di 
FT 

FT misc.di 

FT 

FT misc.di 
FT 

ft misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

FT misc.di 
FT 

ft misc.di 

FT 

FT misc.di 

FT 

FT misc.di 
FT 



/label- 
Terence 1004 

/label- 
iference 1034 

/label- 
iference 1072 

/label- 
iference 1113 

/label- 
iference 1178 

/label- 
iference 1194 

/label- 
iference 1240 

/label- 
iference 1269 ■ 

/label- 
iference 1336 

/label- 
Terence 1348 

/label- 
Terence 1382 

/label- 
Terence 1450 

/label- 
Terence 1455 

/label- 
iference 1474 

/label- 
Terence 1518 

/label- 
Terence 1541 

/label- 
Terence 1554 

/label- 
iference 1563 

/label- 
Terence 1647 

/label- 
iference 1661 

/label- 
Terence 1733 

/label- 
Terence 1791 

/label- 
Terence 1834 

/label- 
Terence 1867 

/label- 
iference 1880 

/label- 
iference 1991 

/label- 
Terence 2050 

/label- 
iference 2074 

/label- 
iference 2125 

/label- 
Terence 2248 

/label- 
iference 2353 

/label- 
iference 2395 

/label- 
iference 2412 

/label- 
Terence 2567 

/label- 
Terence 2578 

/label- 
ference 2645 

/label- 



N-glycosylation_site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation_site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation_site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation_site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation_site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation_site 
N-glycosylation.site 
N-glycosylation_site 
N-glycosylation.site 
N-glycosylation_site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation.site 
N-glycosylation_site 
N-glycosylation_site 
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FT misc.difference 2718 

FT /label- N-glycosylation_site 

FT misc.difference 2754 

FT /label= N-glycosylation.site 

FT misc.difference 2818 

FT /label- N-glycosylatioiLsite 

FT misc.difference 2841 

FT /label- N-glycosylation_site 

ft misc.difference 2878 

FT /label- N-glycosylation_site 

ft misc.difference 2925 

FT /label- N-glycosylation_site 

FT misc.difference 2956 

FT /label- N-glycosylation_site 

FT misc.difference 2994 

FT /label- N-glycosylation_site 

Jl misc.difference 3737 

/label- N-glycosylation_site 

Wf misc.difference 3789 

TT /label- N-glycosylation_site 

FT misc.difference 3844 

FT /label- N-glycosylation site 

PN W09534649-A2, 

PD 21-DEC-1995. 

PF 13-JUN-1995; G01386. 

PR 14-JUN-1994; GB-011900. 

PR 23-DEC-1994; WO-G02822. 

PR 13-APR-1995; GB-007766. 

PR 14-APR-1995; US-422582. 

PA (MEDI-) MEDICAL RES COUNCIL, 

PA (UYLE-) RIJKSUNIV LEIDEN, 

PA (DYRO-) UNIV ROTTERDAM ERASMUS . 

PA (0YHA-) UNIV WALES COLLEGE OF MEDICINE. 

PI Breuning MH, Halley DJJ, Harris PC, Hesseling ALW; 

PI Hughes J, Janssen laj, Nellist MD, Peral B, Peters DJM; 

PI Roelfsema JH, Sampson J, Ward CJ; 

DR WPI; 96-049678/05. 

DR N-PSDB; T13821. 

PT Isolated polycystic kidney disease I gene and its deletion mutants 

PT - useful in diagnosis and treatment of PRDl-associated disease and 

PT in gene therapy 

PS Claim 16; Fig 15; 181pp; English. 

CC PKD1 polypeptide (W00870), designated polycystic has a role in the 

CC prevention or suppression of adult polycystic kidney disease (APKD) . 

CC It is the product of the PRD1 gene (T13821) located on human 

•chromosome 16. The gene is mutated in PKD1 patients. The PKD1 gene 
can be used to produce PKDl polypeptide in transformed host 
cells. The polypeptide is useful in the diagnosis and treatment 

CC of PKDl -associated diseases, and to raise antibodies. 

SQ Sequence 4302 AA; 

Query Match 17.6%; Score 175; DB 19; Length 4302; 

Best Local Similarity 32.21; Pred, No. 1.23e-06; 



Matches 


Db 


52 


Qy 


16 


Db 


110 


Qy 


76 


Db 


169 


Qy 


133 



:| I I: I ::| 



I: I :ll III :MI:|: I I I :| 



RESULT 13 

ID R75916 standard; Protein; 4339 AA. 
AC R75916; 

DT 14-APR-1996 (first entry) 

DE .Polycystic kidney disease 1 gene product. 



:| Ih: I : :|:|:|: I 



I I I 



Autosomal dominant polycystic kidney diease; ADPKD; 
polycystic kidney disease 1 gene; PKDl; diagnostic; therapy. 
Homo sapiens, 
W09518225-A1. 
06 -JUL- 1995. 



23- DEC-1994 

24- DEC-1993 
14-JUN-1994 
(MED! 
(UYLE- 



G02822. 
GB-026470. 
GB-011900. 
MEDICAL RES COUNCIL, 
RIJKSUNIV LEIDEN. 



(UYR0-) UNIV ROTTERDAM ERASMUS, 
(UYWA-) UNIV WALES COLLEGE OF MEDICINE. 

Breuning MH, Halley DJJ, Harris PC, Hesseling ALW; 
Hughes J, Janssen LAJ, Nellist MD, Peral B, Peters DJM; 
Roelfsema JH, Sampson J, Ward CJ; 
WPI; 95-246390/32. 
N-PSDB; Q91438. 

Isolated poly: cystic kidney disease 1 gene and its mutants - useful 
for treatment and diagnosis of autosomal dominant poly: cystic kidney 
disease 

Disclosure; Fig 10; 119pp; English. 

A novel protein (R75916) is encoded by the polycystic kidney disease 
1 (PKDl) gene (see Q91438), which maps to 16pl3.3. Mutations at this 
locus are associated with autosomal dominant polycystic kidney disease 
(ADPKD) . The protein can be used to screen actual or suspected 
ADPKD patients for normal or mutated PKDl polypeptide, or is 
used to treat or prevent PKDl -associated disorders such as ADPKD by 
administration to affected cells. 
Sequence 4339 AA; 



Query Match 17.6%; 
Best Local Similarity 32.2%; 
Matches 39; Conservative 



Score 175; DB 15; Length 4339; 

Pred. No. 1.23e-06; 

29; Mismatches 47; Indels 6; 



9 sgrglrtl-gpalrip-adataldvshnllraldvgllanlsalaeldisnnkistleeg 66 

:! I I: I : = l : :| : Ih : :| II |:|:|: I 

16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

67 ifanlfnlseinlsgnpfecdcgl-awlprwaeeqqvrvvqpeaatcagpgslagqpllg 125 

I: I :H III :IH:|:| I I I :| |: I 

76 AFTTLVSLST I NLLSNPFNCNCHLGAGLGKWLR ■ KR- RIV • SGNPRCQKPFFLKEI P IQG 132 



Db 


126 i 126 




Qy 


133 V 133 




RESULT 14 




ID 


R87539 standard; Protein; 4339 AA. 




AC 


R87539; 




DT 


02-FEB-1997 (first entry) 




DE 


polycystic kidney disease 1 polypeptide (polycystin). 


KW 


Adult polycystic kidney disease; APKD; PKDl 


ene; diagnosis; 


KW 


therapy; polycystin. 


OS 


Homo sapiens. 




PN 


W09534649-A2. 




PD 


21-DEC-1995. 




PF 


13-JUN-1995; G01386. 




PR 


14-JUN-1994; GB-011900. 




PR 


23-DEC-1994; WO-G02822. 




PR 


13-APR-1995; GB-007766. 




PR 


14-APR-1995; US-422582. 




PA 


(MEDI-) MEDICAL RES COUNCIL. 




PA 


(UYLE-) RIJKSUNIV LEIDEN, 




PA 


(UYR0-) UNIV ROTTERDAM ERASMUS. 




PA 


(UYWA-) UNIV WALES COLLEGE OF MEDICINE. 




PI 


Breuning MH, Halley DJJ, Harris PC, Hesse 


ing ALW; 


PI 


Hughes J, Janssen laj, Nellist md, Peral B 


Peters DJM; 


PI 


Roelfsema JH, Sampson J, Ward CJ; 




DR 


WPI; 96-049678/05. 




DR 


N-PSDB; T08807. 




PT 


Isolated polycystic kidney disease I gene an 


its deletion mutants 


PT 


- useful in diagnosis and treatment of PKDl-associated disease and 
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pt in gene therapy 

PS Claim 18; Fig 10; IBIpp; English. 

CC PKD1 polypeptide (R87539) is encoded by a partial cDNA clone 

CC (T08803) corresponding to the complete human PKD1 gene (see also 

CC T13821) apart from its extreme 5' end. This gene is associated 

CC with adult polycystic kidney disease (APKD) . The polypeptide can 

CC be produced in transformed host cells for use in the diagnosis of 

CC PKDl-associated diseases, for the detection of disease carriers, 

CC and for the treatment or prevention of these diseases, The PKD1 

CC polypeptide plays a role in the suppression or prevention of APKD. 

SO Sequence 4339 AA; 

Query Match 17.6%; Score 175; DB 19; Length 4339; 

Best Local Similarity 32,2%; Pred, No. 1.23e-06; 

Matches 39; Conservative 29; Mismatches 47; indels 6; Gaps 6; 

Db 9 sgrglrtl-gpalrip-adataldvshnllraldvgllanlsalaeldisnnkistleeg 66 

:| I I: I "I : :| : ||: : :| ||:: I : :|:|:|: | 
Qy 16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

•67 ifanlfnlseinlsgnpfecdcgl-awlprwaeeqqvrvvqpeaatcagpgslagqpllg 125 
I: I HI Ill :|lhl:| I I I :| : |;| : : I I |: I 
76 AFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLR-KR-RIV-SGNPRCQKPFFLKEIPIQG 132 



Qy 133 V 133 



ID 


R90302 standard; Protein; 4303 AA, 


AC 


R90302; 




DT 


18-JUL-1996 


(first entry) 


DE 


Polycystic kidney disease 1 gene product. 


KW 


Polycystic kidney disease; PKD1; autosomal dominant; ADPKD; mutation; 


KW 


exon; short arm; chromosome 16; repeated region; alternative splicing; 


KW 


extracellular matrix proteins; antibody; detection; diagnosis; 


KW 


mini gene therapy. 


OS 


Homo sapiens. 




FH 


Key 


Location/Qualifiers 


FT 


peptide 


1..23 


FT 




/note- "Signal peptide" 


FT 


peptide 


32. .65 


FT 




/note- "Leucine-rich repeat cystein-rich N-terminus" 


FT 


domain 


72. .94 


FT 




/note- "Leucine-rich repeat domain, LRR1" 


FT 


domain 


97. .119 


FT 




/note- "Leucine-rich repeat domain, LRR2" 




peptide 


123.. 177 






/note- "Leucine-rich repeat cysteine-rich C-terminus" 




domain 


282.. 353 






/note- "PKD domain, Rl" 


FT 


binding_site 


405.. 533 


FT 




/note- "C-type lectin binding domain" 


FT 


domain 


639.. 671 


FT 




/note- "LDL module, LDL-A" 


FT 


domain 


1032.. 1124 


FT 




/note- "PKD domain, R2 n 


FT 


domain 


1138.. 1209 


FT 




/note- "PKD domain, R3" 


FT 


domain 


1221.. 1292 


FT 




/note- "PKD domain, R4" 


FT 


domain 


1305. .1377 


FT 




/note- "PKD domain, R5 n 


FT' 


domain 


1390.. 1463 


FT 




/note- "PKD domain, R6" 


FT 


domain 


1477.. 1545 


FT 




/note- "pkd domain, R7" 


FT 


domain 


1559.. 1629 


FT 




/note- "PKD domain, R8" 


FT 


domain 


1643.. 1715 


FT 




/note- "PKD domain, R9" 


FT 


domain 


1729.. 1799 



FT /note- "PKD domain, R10" 

FT domain 1815., 1884 

FT /note- "PKD domain, Rll" 

FT domain 1898., 1968 

FT /note- "PKD domain, R12" 

FT domain 1983.. 2058 

FT /note- "PKD domain, R13" 

FT domain 2071.. 2142 

FT /note- "PKD domain, R14" 

PN W09534573-A1. 

PD 21-DEC-1995. 

PF 02-JUN-1995; U07079. 

PR 03-JUN-1994; DS-253524. 

PR 30-MAR-1995; US-413580. 

PA (BGHM ) BRIGHAM S WOMENS HOSPITAL. 

PA (MILL-) MILLENIUM PHARM INC. 

PI Glucksmann.S, Reeders S, Schneider M; 

DR WPI; 96-049618/05. 

DR N-PSDB; T11708. 

PT DNA encoding polycystic kidney disease gene product ■ for use in 

PT gene therapy of ADPKD, and in the evaluation of treatment for PKD 

PS Claim 2; Fig 6; 126pp; English, 

CC This sequence represents a polycystic kidney disease (PKD1) product 

CC which is associated with autosomal dominant polycystic kidney 

CC disease (ADPKD) . Mutations within the PKD1 gene are responsible for 

CC approx. 90* of cases of ADPKD. The coding region of the PKDl gene 

CC is complex and extensive. It covers approx. 60 kb and contains a 

CC total of 46 exons. It has been localised to within a 750 'kb chromosomal 

CC region on the short arm of chromosome 16. Approximately the first two 

CC thirds of the PKDl gene is duplicated several times in a transcribed 

CC fashion elsewhere in the genome. The PKDl gene also contains extensive 

CC repeated regions of high GC content. A number of the exons have 

CC alternatively spliced forms giving rise to a number of cDNA clones , 

CC The PKDl protein contains at least 5 distinct peptide domains which 

CC are likely to be involved in protein -protein and/or protein- 

CC carbohydrate interactions. It also shares amino acid similarity 

CC with a number of extracellular matrix proteins . Antibodies raised 

CC against the PKDl protein may be used in the detection of mutant PKDl 

CC and, therefore, diagnosis of ADPKD. Fragments of the PKDl gene may be 

CC used in "mini" gene therapy for the treatment of ADPKD. 

SQ Sequence 4303 AA; 

Query Match 16.8%; Score 167; DB 17; Length 4303; 

Best Local Similarity 32.2%; Pred. No. 6.42e-06; 

Matches 39; Conservative 27; Mismatches 49; Indels 6; Gaps 6; 

Db 52 sgrglrtl-gpalrip-adateldvshnllraldvgllanlsalaeldisnnkistleeg 109 

:| I I: I ::| : I : ||: : :| ||:; | : :|:|:|: | 
Qy 16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

Db 110 ifanlfnlseinlsgnpfecdcgl-awlpqwaeeqqvrwqpeaatcagpgslagqpllg 168 

I: I :M III : I ■ I : I : I MM' : |:| : : I I I |: | 
Qy 76 AFTTLVSLST IMLLSNPFNCNCHLGAGLGKWLR - KR- RIV - SGNPRCOKPFFLKE I PIQG 132 



Db 169 i 169 
Qy 133 V 133 



Search completed: Fri May 28 08:47:16 1999 
Job time : 67 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

:rch_pp protein • protein database search, using Smith-Waterman algorithm 

tun on: 



Fri Hay 28 08:47:53 1999; MasPar time 16.43 Seconds 

336.623 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



MJS-09-191-647-4 

(1-138) from US09191647 .pep 

995 

1 EGAFNGAASVQELMLTGNQL CQKPFFLREIPIQGVGHPGI 138 

PAM 150 
Gap 11 

122810 seqs, 40068593 residues 



Post -processing: Minimum Match 0% 

Listing first 45 summaries 

Database: pir60 

1 : pirl 2:pir2 3:pir3 4:pir4 

Statistics; Mean 42.283; Variance 84.876; scale 0.498 

Pred, No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



SUMMARIES 



NO, 


Score 


Match Length D 


3 ID 


Description 


Pred. No. 


1 


326 


32.8 


1469 


B36665 


slit protein 2 precur 


3,15e-40 


2 


326 


32.8 


1480 


A36665 


slit protein 1 precur 


3.15e-40 


3 


217 


21.8 


560 


A60164 


platelet membrane gly 


3.68e-20 


4 


208 


20.9 


1091 


A58532 


glial cell membrane g 


1.40e-18 


5 


195 


19.6 


907 


JE0176 


orphan G protein-coup 


2.54e-16 


6 


191 


19.2 


361 


A53860 


chondroadherin precur 


1.246-15 


7 


185 


18.6 


605 


JC5239 


insulin-like growth f 


1.30e-14 


8 


184 


18.5 


605 


A41915 


insulin-like growth f 


1.93e-14 


9 


183 


18.4 


536 


A34901 


lysine carboxypeptida 


2.85e-14 


10 


179 


18.0 


603 


JC1282 


insulin-like growth f 


1.35e-13 


11 


175 


17.6 


4302 


A38971 


polycystic kidney dis 


6.33e-13 


12 


172 


17.3 


682 


A43318 


connectin precursor ■ 


2.00e-12 


13 


172 


17.3 


682 


A49121 


cell-surface molecule 


2.00e-12 


14 


166 


16.7 


603 


JC6128 


insulin-like growth f 


1.97e-ll 


15 


165 


16.6 


420 


A53531 


oncofetal trophoblast 


2.88e-ll 


16 


162 


16.3 


313 


G02020 


p37NB - human 


8.95e-ll 


17 


158 


15.9 


1115 


S40241 


G protein -coupled rec 


4.01e-10 


18 


150 


15.1 


1535 


S46224 


peroxidasin - fruit f 


7.73e-09 


19 


149 


15.0 


1134 


> A29944 


chaoptin precursor - 


U2e-08 


20 


144 


14.5 


312 


NBHUA2 


leucine-rich alphas- 


6.87e-08 


21 


140 


14.1 


1025 


A57676 


protein kinase Xa21 ( 


2.90e-07 


22 


139 


14.0 


360 


S68209 


sds22 protein homolog 


4.14e-07 


23 


138 


13.9 


343 


A41748 


lumican precursor - c 


5.91e-07 



24 


138 


13.9 


626 


1 


NBHUIA 


platelet glycoprotein 


5.91e-07 


25 


136 


13.7 


360 


2 


147020 


decorin - rabbit 


1.20e-06 


26 


136 


13.7 


661 


2 


156258 


RP105 - mouse 


1.20e-06 


27 


136 


13.7 


1097 


2 


A29943 


Toll protein precurso 


1.20e-06 


28 


135 


13.6 


359 


1 


NBHUC8 


decorin precursor - h 


1.71e-06 


29 


134 


13.5 


176 


1 


A46606 


platelet glycoprotein 


2.43e-06 


30 


134 


13.5 


662 


2 


S42799 


garp precursor - huma 


2.43e-0e 


31 


133 


13.4 


230 


2 


146918 


leucine-rich glycopro 


3.45e-06 


32 


132 


13.3 


360 


2 


S06280 


decorin precursor - b 


4.90e-06 


33 


132 


13.3 


684 


2 


T01267 


leucine-rich repeat t 


4.90e-06 


34 


132 


13.3 


942 


1 


JQ1674 


protein kinase TMKl ( 


4,90e-06 


35 


129 


13.0 


206 


1 


NBHUIB 


platelet glycoprotein 


l,39e-05 


36 


129 


13,0 


411 


1 


155604 


platelet glycoprotein 


1.39e-05 


37 


128 


12.9 


863 


2 


A55173 


cf-9 protein precurso 


l,96e-05 


38 


127 


12.8 


690 


2 


T01183 


protein kinase homolo 


2,77e-05 


39 


126 


12.7 


852 


2 


151259 


tyrosine kinase C rec 


3.90e-05 


40 


125 


12.6 


368 


1 


BGHUN 


biglycan precursor • 


5.49e-05 


41 


125 


12.6 


369 


2 


S32793 


biglycan precursor • 


5.49e-05 


42 


125 


12.6 


369 


2 


S20811 


proteoglycan I • mous 


5.49e-05 


43 


125 


12,6 


369 


2 


S32559 


biglycan precursor • 


5.49e-05 


44 


124 


12.5 


350 


2 


E71373 


probable regulatory p 


7,72e-05 


45 


124 


12.5 


354 


2 


A55454 


decorin precursor - m 


7,72e-05 



RESULT 
ENTRY 
TITLE 



1 



ORGANISM 
DATE 



ACCESSIONS 



B36665 ttype complete 
slit protein 2 precursor - fruit fly (Drosophila 

melanogaster) 
tformaljiame Drosophila melanogaster 
30-Apr-1991 tsequence_revision 30-Apr-1991 ftext change 

16-Dec-1998 
B36665 
A36665 

iauthors Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
tjournal Genes Dev. (1990) 4:2169-2187 

ititle slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains. • 
Icross-references MOID: 91099665 
iaccession B36665 

ttstatus preliminary 
ltmolecule_type mRNA 
llresidues 1-1469 Itlabel ROT 
f tcross-references GB:X53959 
GENETICS 

tgene FlyBaseisli 

tfcross-references FlyBase:FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino -terminal homology; EGF 
homology; leucine-rich alpha - 2 - glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 

FEATURE 



66-91 


idomain proteoglycan amino-terminal homology ilabel 




PAH1\ 


101-124 


idomain leucine-rich alpha-2-glycoprotein repeat 




homology Ilabel LRRiy 


125-148 


Idomain leucine-rich alpha- 2 - glycoprote in repeat 




homology Ilabel LRR2\ 


149-172 


Idomain leucine-rich alpha-2-glycoprotein repeat 




homology Ilabel LRR3\ 


173-196 


Idomain leucine-rich alpha-2-glycoprotein repeat 




homology Ilabel LRR4\ 


197-220 


Idomain leucine-rich alpha- 2 - glycoprote in repeat 




homology Ilabel LRR5\ 


228-272 


Idomain proteoglycan carboxyl-terminal homology Ilabel 




PCS1\ 


288-313 


Idomain proteoglycan amino-terminal homology Ilabel 




PAH2\ 


323-346 


Idomain leucine-rich alpha - 2 -g lycoprote in repeat 




homology ilabel LRR6\ 


347-370 


Idomain leucine-rich alpha - 2 - g lycoprote in repeat 
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homology flabel LRR7\ 


371 


394 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR8\ 


395 


418 


((domain leucine-rich alpha-2-glycoprotein repeat 
homology flabel LRR9\ 


419 


442 


tdomain lsucins _ rich alpha-2-glycoprotein repeat 
homology flabel LR10\ 


450 


494 


((domain proteoglycan carboxyl 'terminal homology tlabel 

PCS2\ 


512 


537 


(tdomain proteoglycan aroino-terniinal homology tlabel 
PAH3\ 


547 


571 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRll\ 


572 


595 


tdomain leucine-rich alpha. ■ 2 "glycoprotein repeat 
homology flabel LR12\ 


596 


619 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology flabel LR13\ 


620 


643 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR14\ 


651 


695 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS3\ 


(708 


733 


tdomain proteoglycan amino-terminal homology flabel 

PAH4\ 


743 


766 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology flabel LR15\ 


767 


790 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology flabel LR16\ 


846 


890 


tdomain proteoglycan carboxyl -terminal homology tlabel 
PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



flength 1469 fmolecular-weight 164695 tchecksum 8361 

Query Match 32,8%; Score 326; DB 2; Length 1469; 

Best Local Similarity 37 .5%; Pred. No, 3. 15e-40; 

Matches 45; Conservative 35; Mismatches 38; Indels 2; Gaps 2 

3b 365 ALSGLKQLTTLVLYGNKIKDLPSGVFKG-LGSLRLLLLNANEISCIRKDAFRDLHSLSLL 423 

: hi II : .: : hi |::|: hi :| hh :|:| I h II 
3y 3 AFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLL 62 

3b 424 SLYDNNIQSLANGTFDAMKSMKTVHLAKNPFICDCNLRW-LADYLHKNPIETSGARCESP 482 

Mill I ::: hi :: h h:| III hhl h hi I :: :||: I 
3y 63 SLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



RESULT 2 
ENTRY 
TITLE 



ACCESSIONS 



fauthors 



t journal 
ttitle 



A36665 ttype complete 
slit protein 1 precursor - fruit fly (Drosophila. 
melanogaster) 

AjANISM tformaljiame Drosophila melanogaster 
^Je 30-Apr-1991 tsequence revision 30-Apr-1991 ftext change 

w 24-Sep-1998 
A36665; S13523 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.r 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross -references MUID: 91099665 
taccession A36665 

tf status preliminary 
ttmolecule.type mRNA 
tfresidues 1-1480 I tlabel ROT 
» tcross -references GB:X53959; NID:g8614; PID:g8615 
GENETICS 

fgene FlyBase:sli 

ftcross-references FlyBase:FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha* 2 -glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 
KEYWORDS alternative splicing 



FEATURE 
66-91 

101-124 

125-148 

149-172 

173-196 

197-220 

228-272 

288-313 

323-346 

347-370 

371-394 

395-418 

419-442 

450-494 

512-537 

547-571 

572-595 

596-619 

620-643 

651-695 

708-733 

743-766 

767-790 

791-814 

815-838 

846-890 

• 1028-1061 
SUMMARY 



tdomain proteoglycan amino-terminal homology tlabel 
PAH1\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR1\ 
tdomain leucine-rich alpha - 2 - g 1 y copr ote in repeat 

homology tlabel LRR2\ 
tdomain leucine-rich alpha - 2 -g lycoprotein repeat 

homology tlabel LRR3\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology flabel LRR4\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR5\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS1\ 

tdomain proteoglycan amino-terminal homology tlabel 
PAH2\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology flabel LRR7\ 
tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology flabel LRR8\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR9\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR10\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS2\ 

tdomain proteoglycan amino-terminal homology tlabel 
PAH3\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR11\ 
tdomain leucine-rich alpha - 2 -g lycoprotein repeat 

homology tlabel LR12\ 
tdomain leucine-rich alpha - 2 -g lycoprotein repeat 

homology flabel LR13\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR14\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS3\ 

tdomain proteoglycan amino-terminal homology flabel 
PAH4\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR15\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR16\ 
tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LR17\ 
tdomain leucine-rich alpha - 2 -g lycoprotein repeat 

homology flabel LR18\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS4\ 

tdomain EGF homology flabel EGF 
flength 1480 fmolecular-weight 165751 (checksum 900 



Query Match 32,8%; Score 326; DB 2; Length 1480; 

Best Local Similarity 37.5%; Pred. No. 3.15e-40; 

Matches 45; Conservative 35; Mismatches 38; Indels 2; Gaps 2; 

Db 365 ALSGLKQLTTLVLYGNKIKDLPSGVFKG-LGSLRLLLLNANEISCIRKDAFRDLHSLSLL 423 

h:| : hi II : :' : hi h:h hi :| hh :|:| I h II 
Qy 3 AFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLL 62 

Db 424 SLYDNNIQSLANGTFDAMKSMKTVHLAKNPFICDCNLRW-LADYLHKNPIETSGARCESP 482 

Mill I ::: hi :: h h:| III hhl I: hi I :: :lh I 
Qy 63 SLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



ENTRY A60164 ttype complete 

TITLE platelet membrane glycoprotein V precursor - human 

ORGANISM fformal.name Homo sapiens tcommonjiame man 
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DATE 12-Jan-1993 tsequence_revision 24-Feb-1994 ttext change 

17-Mar-1999 

ACCESSIONS A48030; A60164; A35483; B35483; C35483; A60432; A47507; 

S34329 
REFERENCE A48030 

tauthors Lanza, F,; Morales, M.; de La Salle, C; Cazenave, J, P.; 

Clemetson, K.J.; shimomura, T,; Phillips, D.R. 
tjournal J. Biol. Chem. (1993) 268:20801-20807 
ttitle Cloning and characterization of the gene encoding the human 
platelet glycoprotein V. A member of the leucine-rich 
glycoprotein family cleaved during thrombin -induced 
platelet activation, 
tcross -references MUID: 94012616 
taccession A48030 
ttmoleculejype DNA 
ttresidues 1-560 ttlabel LA2 

•licross -references EMBL:Z23091; NID:g312501; PID:g312502 
ERENCE A60164 
tauthors Shimomura, I.; Fujimura, K, ; Maehama, S,; Takemoto, M,; Oda, 
K.; Fujimoto, T.; Oyama, R, ; Suzuki, M. ; Ichihara-Tanaka, 
K.; Titani, K.; Kuramoto, A, 
tjournal Blood (1990) 75:2349-2356 

(ttitle Rapid purification and characterization of human platelet 
glycoprotein V: the amino acid sequence contains 
leucine-rich repetitive modules as in glycoprotein lb. 

tcross -references MUID: 90275263 

taccession A60164 
ttmolecule.type protein 

ttresidues 365-384, 'X' , 386-390, 'X' ,392-395, 'X' ,397; 188-208, 'I' ,210; 

27-50, 'X' , 52-53; 174-180, 'X' , 182 - 187 ; 121- 144 ;145-172; 
290-297, 'X' ,299-311, 'X' ,313-326, 'I';142-151, 'X', 
153-163; 'YNTPDRXLAXYGGF'; 81-105, 'XX', 108, T;61-72, 
'TK', 75-77; 'V ,56-57 ;'G\ 479-487, 'X', 489-498, T, 500, 
'X', 502-503, 'X', 505, 'X', 507-508, 'D' ttlabel SHI 

REFERENCE A3 54 8 3 

tauthors Roth, G.J.; Church, T.A.; McMullen, B,A,; Williams, S,A, 
tjournal Biochem, Biophys, Res. Commun. (1990) 170:153-161 
ttitle Human platelet glycoprotein V: a surface leucine-rich 

glycoprotein related to adhesion, 
tcross-references MUID; 90321220 
taccession A35483 
ttmolecule.type protein 

ttresidues 145-166, T, 168-169, 'X',171-172 ttlabel ROT 
ttnote this proteolytic fragment was designated peptide M392 

taccession B35483 

•ttmolecule.type protein 
ttresidues 121-129, 'W',131-135;466-468,'X', 470 ttlabel R02 
ttnote this material was designated peptide M393 but may 

contain two peptides 

taccession C35483 
ttmolecule.type protein 

ttresidues 252-266, T, 268-272, 'X', 274-279, 'I', 281-284, 'I', 286 
ttlabel R03 

ttnote this proteolytic fragment was designated peptide M401 

REFERENCE A60432 

tauthors Zafar, R.S.; Walz, D.A, 
tjournal Thromb. Res, (1989) 53:31-44 

ttitle Platelet membrane glycoprotein V: characterization of the 

thrombin -sensitive glycoprotein from human platelets, 
tcross-references MOID: 89162331 
taccession A60432 
ftmoleculejype protein 

ttresidues 477-478, 'FX' ,481-485, 'E' ,487, 'V, 489-492, 'NQ', 495, 'E', 
497-498 ttlabel ZAF 

REFERENCE A47507 

tauthors Hickey, M.J.; Hagen, F.S.; Yagi, M. ; Roth, G.J, 
tjournal Proc. Natl. Acad, Sci. U.S.A. (1993) 90:8327-8331 
ttitle Human platelet glycoprotein V: characterization of the 

polypeptide and the related Ib-V-IX receptor system of 

adhesive, leucine-rich glycoproteins, 
tcross-references MUID: 93391348 
taccession A47507 

, .tfstatus preliminary; translated from GB/EMBL/DDBJ 



itmolecule_type mRNA 
ttresidues 1-560 ttlabel RES 
ttcross-references GB:L11238; NID:g388759; PID:g388760 
COMMENT This platelet membrane protein is a substrate for thrombin , 
COMMENT The amino end of the intact protein is blocked, 
COMMENT This protein is absent in Bernard-Soulier syndrome, 
GENETICS 

tgene GDB:GP5 

ttcross-references GDB:230236; OMIM:173511 
tmapjposition 5pter-5qter 
CLASSIFICATION tsuperfamily leucine-rich alpha -2 -glycoprotein repeat 
homology 

KEYWORDS blocked amino end; glycoprotein; platelet; tandem repeat; 

transmembrane protein 
SUMMARY . tlength 560 tmolecular -weight 60958 tchecksum 7673 

Query Match 21.8%; Score 217; DB 2; Length 560; 

Best Local Similarity 34.4%; Pred. No. 3.68e-20; 

Matches 43; Conservative 28; Mismatches 49; Indels 5; Gaps 4; 

Db 332 QGAFQGLGELQVLALHSNGLTALPDGLLRG-LGKLRQVSLRRNRLRALPRALFRNLSSLE 390 

:IH I : :| I I :| I :: :|| |: |: : II I : :: I III: 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 391 SVQLDHNQLETLPGDVFGALPRLTEVLLGHNSWRCDCGLGPFLG-WLRQHLGLVGGEEPP 449 

: I I:: h I :| I: : I |: |:| II: II III : :|:|: I 
Qy 61 LLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRR-IVSGN--P 117 

Db 450 RCAGP 454 

tl I 

Qy 118 RCQKP 122 



ENTRY A58532 I type complete 

TITLE glial cell membrane glycoprotein LIG-1 precursor - mouse 

ORGANISM tformaljiame Mus musculus tcommonjiame house mouse 

DATE ll-Apr-1997 tsequence_revision ll-Apr-1997 ttext change 

17-Mar-1999 
ACCESSIONS A58532 
REFERENCE A58532 

tauthors Suzuki, Y,; Sato, N.; Tohyama, M, ; Wanaka, A.; Takagi, T, 

♦journal J. Biol. Chem, (1996) 271:22522-22527 

ttitle cDNA cloning of a novel membrane glycoprotein that is 

expressed specifically in glial cells in the mouse brain; 
LIG-1, a protein with leucine-rich repeats and 
immunoglobulin-like domains, 
tcross-references MUID: 96394313 
taccession A58532 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
ttmolecule.type mRNA . 
ttresidues 1-1091 ttlabel SUZ 
ttcross-references GB:D78572; NID:gl545806; PID:gl545807 
CLASSIFICATION tsuperfamily leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl-terminal homology 

FEATURE 

36-61 tdomain proteoglycan amino-terminal homology tlabel PAH\ 

71-94 fdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LRR1\ 
95-117 tdomain leucine-rich a lpha - 2 -glycoprotein repeat 

homology tlabel LRR2\ 
118-141 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
142-165 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
166-189 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR5\ 
191-213 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
214-237 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR7\ 
238-261 tdomain leucine-rich alpha-2-glycoprotein repeat 
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homology tlabel LRR8\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR9\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR10\ 
tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology ilabel LR11\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR12\ 
tdomain leucine-rich alpha - 2 - g lycoprote in repeat 

homology tlabel LR13\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR14\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR15\ 
tdomain proteoglycan carboxyl- terminal homology tlabel 

PCS1 

tlength 1091 tmolecular-weight 119283 tchecksum 7937 



Query Match 20.9*; Score 208; DB 2; Length 1091; 

t Local Similarity 33,1*; Pred. No. 1.40e-18; 
.tches 45; Conservative 30; Mismatches 57; Indels- 4; Gaps 4; 

350 EGAFKG LKSLRVLDLDHNE I SGTI EDT SGAFTG LDNLS KLTLFGNK I KS VAKRAFSG LES 409 
lllhl I:: i I I:: |: I II I I I :| I |:: :|:|| | 
1 EG AFNGMS VQELMLTGNQLE - TVHG - RG FRGG LSG LKT LMLRS NL IGC VSNDTFAG LS S 58 



Db 410 LEHLNLGENAIRSVQFDAFAKMKNLKELYISSESFLCDCQLKW-LPPWLMGRMLQAFVTA 468 

: 1:1 :l I :: II: : :| : : |::| |:|:| I II I : : ; 
Oy 59 VRLLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVS-GNP 117 

Db 469 TCAHPESLKGQSIFSV 484 

1111:1 : 
Qy 118 RCQKPFFLKEIPIQGV 133 



RESULT 5 

ENTRY JE0176 ttype complete 

TITLE orphan G protein-coupled receptor precursor - human 

ORGANISM tformaljiame Homo sapiens tcommonjiame man 

DATE 03-Jul-1998 #sequence_revision 10-Jul-1998 I text change 

17-Mar-1999 
ACCESSIONS JE0176 
REFERENCE JE0176 

fauthors McDonald, T,; Wang, R.; Bailey, W.; xie, G.; Chen, F. ; 

Caskey, C.T.; Liu, Q. 
tjournal Biochem, Biophys . Res. Commun. (1998) 247:266-270 
ttitle Identification and cloning of an orphan G protein -coupled 

•receptor of the glycoprotein hormone receptor subfamily, 
cross-references MUID : 98308104 
accession JE0176 
tfmolecule_type mRNA 
ttresidues 1-907 ttlabel MCD 
ttcross -references GB: AF062006 
comment This protein is a receptor for a novel class of glycoprotein 
ligands. 

GENETICS 

tgene HG38 

tmap_position 12q22-23 
FEATURE 

1-21 tdomain signal sequence tstatus predicted tlabel SIG\ 

562-583 tdomain transmembrane tstatus predicted tlabel TM1\ 

594-616 tdomain transmembrane tstatus predicted tlabel TM2\ 

639-660 tdomain transmembrane tstatus predicted tlabel TM3\ 

681-701 tdomain transmembrane tstatus predicted tlabel TM4\ 

725-744 tdomain transmembrane tstatus predicted tlabel TM5\ 

768-791 tdomain transmembrane tstatus predicted tlabel TM6\ 

803-824 tdomain transmembrane tstatus predicted tlabel TM7 

SUMMARY tlength 907 tmolecular-weight 99997 tchecksum 8790 

Query Match 19.6*; Score 195; DB 2; Length 907; 

Best Local Similarity 40.2*; Pred, No. 2.54e-16; 

Matches 35; Conservative 16; Mismatches 35; Indels 1; Gaps 1; 



Db 108 GAFTGLYSLKVLMLQNNQLRHVPTEALQN-LRSLQSLRLDANHISYVPPSCFSGLHSLRH 166 

III I I: III III I ::: I :| :| I :| |: |: Ml |:| 
Qy 2 GAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRL 61 

Db 167 LWLDDNALTEIPVQAFRSLSALQAMTL 193 

I I II :| I II :| :| :: I 
Qy. 62 LSLYDNRITTITPGAFTTLVSLSTINL 88 



RESULT 6 

ENTRY A53860 ttype complete 

TITLE chondroadherin precursor - bovine 

ALTERNATEjmes 38K leucine-rich protein. 

ORGANISM tformaljiame Bos primigenius taurus tcommonjiame cattle 

DATE 07-Oct-1994 tsequence_revision 07-Oct-1994 ttext change 

16-Dec-1998 
ACCESSIONS A53860 
REFERENCE A53860 

tauthors Neame, P.J.; Sommarin, Y, ; Boynton, R.E.; Heinegard, D, 

tjournal J. Biol. Chem. (1994) 269:21547-21554 

ttitle The structure of a 38-kDa leucine-rich protein 

(chondroadherin) isolated from bovine cartilage, 
tcross -references MOID: 94342341 
♦accession A53860 

ttstatus preliminary 
ttmoleculejype mRNA 
ttresidues 1-361 ft label NEA 
ttcross -references GB:O08018; NID:g470671; PlD:g470672 
CLASSIFICATION tsuperfamily leucine-rich alpha - 2 - glycoprotein repeat 

homology; proteoglycan carboxyl -terminal homology 
KEYWORDS disulfide bond 

FEATURE 

300-346 tdomain proteoglycan carboxyl -terminal homology tlabel 

PCH 

SUMMARY tlength 361 tmolecular-weight 40861 tchecksum 369 

Query Match 19.2*; Score 191; DB 2; Length 361; 

Best Local Similarity 30,1*; Pred. No, 1.24e-15; 

Matches 34; Conservative 31; Mismatches 44; Indels 4; Gaps 4; 

Db 223 VEELRLSHNPLKSIPDNAFQSFGRYLETLWLDNTNLEKFSDGAFLGVTTLKHVHLENNRL 282 

1:11:1: I I :: :|:: III:: |: :| |::::: : I :||: 
Qy 10 VQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRI 69 

Db 283 HQL - PSNFP - FDSLETLTLT NNPWKCTCQL- RGLRRWL - EAKTSRPDATCASP 331 

: I: I : II I: I :ll :| hi II :|| : :: I I 
Qy 70 TTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



RESULT 7 

ENTRY JC5239 ttype complete 

TITLE insulin-like growth factor acid-labile chain - baboon 

ORGANISM ■ tformaljiame Papio sp, tcommonjiame baboon 
DATE 17-Apr-1997 tsequence.revision 09-May-1997 ttext change 

09-May-1997 
JC5239 
JC5239 

tauthors Delhanty, P.; Baxter, R.C. 
tjournal Biochem. Biophys. Res. Commun. (1996) 227:897-902 
ttitle The cloning and expression of the baboon acid-labile subunit 
of the insulin-like growth factor binding protein complex, 
tcross -references MUID:97040714 
tcontents liver 
taccession JC5239 
ttmoleculejype mRNA 
ttresidues 1-605 ttlabel del 
COMMENT This factor is structurally related to proinsulin and have 

insuline-like metabolic, differentiative, and cell proliferative 
activities. 

SUMMARY tlength 605 tmolecular-weight 66110 tchecksum 1703 



Query Match . 



18.6*; Score 185; 



3 2; Length 605; 
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Best Local Similarity 34.8*; Pre! No. 1.30e-14; 

Matches 31; Conservative 28; Mismatches 27; Indels 3; Gaps 3; 

Db 187 DAAFRGLGGLRELVLAGNRLAYLQPALF- SGLAELRELDLSRNALRA* IKANVFAQLPRL 244 

::M I ::::lhl:lhl :: I :||: |: I I |: I : : : II |: : 
Qy 1 EGAFNGMSVQEIjMLTGNQLETVHGRGFRGGLSGLKTLML-RSNLIGCVSNDTFAGLSSV 59 

Db 245 QKLYLDRNLIAAVAPGAFLGLKALRWLDL 273 

: I I I |::::IHI I :| ::| 
Qy 60 RLLSLYDNRITTITPGAFTTLVSLSTINL 88 



RESULT 
ENTRY 
TITLE 



ACCESSIONS 



tauthors 
tjournal 
♦title 



A41915 ftype complete 

insulin-like growth factor-binding complex acid-labile chain 
precursor - human 
^LTERNATEJAMES Acid -Labile Subunit (ALS) 
^KANISM tformaljiante Homo sapiens tcommonjiame man 

WE 31-Dec-1993 tsequence revision 31-Dec-1993 ttext change 

20-Mar-1998 
A41915 
A41915 

Leong, S.R.; Baxter, R.C.; Camerato, T. ; Dai, J.; Wood, W.I. 
Mol, Endocrinol. (1992) 6:870-876 
Structure and functional expression of the acid -labile 
subunit of the insulin-like growth factor-binding protein 
complex. 

tcross -references MOID: 92357025 
taccession A41915 

ftstatus preliminary 

##molecule_type mRNA; protein 

ttresidues 1-605 Mlabel LEO 

ttcross-references GB:M86826; NID:gl84807 ; PID:gl84808 

ttexperimental_source liver 

ttnote sequence extracted from NCBI backbone (NCBIP: 110171) 

CLASSIFICATION tsuperfamily leucine-rich alpha-2 -glycoprotein repeat 
homology 

FEATURE 



75- 


8 


tdomain leucine-rich alpha- 


•glycoprotein 


repeat 






homology flabel LRR1\ 






99-122 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 






homology t label LRR2\ 






123 


146 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 






homology tlabel LRR3\ 




147 


170 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 






homology tlabel LRR4\ 




l«. 


194 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 






homology tlabel LRR5\ 






195 


218 


tdomain leucine-rich alpha- 


•glycoprotein 


repeat 






homology tlabel LRR6\ 






219 


242 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 






homology tlabel LRR7\ 






243 


266 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 






homology tlabel LRR8\ 






267 


290 


tdomain leucine-rich alpha - 


-glycoprotein 


repeat 






homology tlabel LRR9\ 






291 


314 


tdomain leucine-rich alpha - 


-glycoprotein 


repeat 






homology tlabel LR10\ 






315 


338 


tdomain leucine-rich alpha - 


-glycoprotein 


repeat 






homology tlabel LR11\ 






339 


362 


tdomain leucine-rich alpha- 


•glycoprotein 


repeat 






homology tlabel LR12\ 






363 


386 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 






homology tlabel LR13\ 






387 


410 


tdomain leucine-rich alpha- 


•glycoprotein 


repeat 






homology tlabel LR14\ 






411 


434 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 






homology tlabel LR15\ 






435 


458 


tdomain leucine-rich alpha- 


•glycoprotein 


repeat 






homology tlabel LR16\ 






459 


482 


tdomain leucine-rich alpha- 


■glycoprotein 


repeat 






homology tlabel LR17\ 






.483 


506 


tdomain leucine-rich alpha- 


-glycoprotein 


repeat 



homology tlabel LR18\ 
507-529 tdomain leucine-rich alpha- 2 -glycoprote in repeat 

homology tlabel LR19 
SUMMARY tlength 605 tmolecular-weight 66034 tchecksum 1870 

Query Match 18.5*; Score 184; DB 2; Length 605; 

Best Local Similarity 34.8%; Pred. No. 1.93e-14; 

Matches 31; Conservative 27; Mismatches 28; Indels 3; Gaps 3; 

Db 187 DAAFRGLGSLRELVLAGNRLAYLQPALF - SGLAELRELDLSRNALRA- IKANVFVQLPRL 244 

::M I :|::ll:|:||:| :: I :lh I: 1 I h I : : : I |: : 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLML-RSNLIGCVSNDTFAGLSSV 59 

Db 245 QKLYLDRNLIAAVAPGAFLGLKALRWLDL 273 

: I I I |::::IHI I :| ::| 
Qy 60 RLLSLYDNRITTITPGAFTTLVSLSTINL 88 



RESULT 9 

ENTRY A34901 ttype complete 

TITLE lysine carboxypeptidase (EC 3,4.17,3) 83K chain - human 

ORGANISM tformaljiame Homo sapiens tcommon_name man 

DATE 20-Jul-1990 tsequence.revision 20-Jul-1990 ttext change 

24-Sep-1998 
ACCESSIONS A34901 
REFERENCE A34901 

tauthors ' Tan, F.; Weerasinghe, D.K.; Skidgel, R.A.; Tamei, H. ; Kaul, 

R.K.; Roninson, I.B.; Schilling, J.W.; Erdoes, E.G. 
tjournal J. Biol, Chem. (1990) 265:13-19 

ttitle The deduced protein sequence of the human carboxypeptidase N 
high molecular weight subunit reveals the presence of 
leucine-rich tandem repeats, 
tcross-references MUID: 90094386 
taccession A34901 

tt status preliminary 
ttmoleculejype mRNA 
tfresidues 1-536 Mlabel TAN 
ttcross-references GB: J05158; NID:gl79935; PID:gl79936 
GENETICS 
• tgene GDB : ACBP 

ttcross-references GDB:127893 
#map_position 6q25.3-6q26 
CLASSIFICATION tsuperfamily leucine-rich alpha-2-glycoprotein repeat 
homology 

KEYWORDS hydrolase; metallo-carboxypeptidase 

FEATURE 

77-100 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR1\ 
101-124 tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LRR2\ 
125-148 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
149-172 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
173-196 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LRR5\ 
197-220 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LRR6\ 
221-244 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LRR7\ 
245-268 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR8\ 
269-292 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LRR9\ 
293-316 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR10\ 
317-340 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR11\ 
341-364 tdomain leucine-rich alpha - 2 - g lycopr ote in repeat 

homology tlabel LR12 
SUMMARY tlength 536 tmolecular-weight 58649 tchecksum 8569 



Query Match 



.8.4*; Score 183; DB 2; Length 536; 
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Best Local Similarity 37.4%; Pre! No. 2.85e-14; 

Matches 34; Conservative 22; Mismatches 34; Indels 1; Gaps 1 

Db 117 EGLFQHLAALESLHLQGNQLQALPRRLFQP-LTHLKTLNLAQNLLAQLPEELFHPLTSLQ 175 

M I I::: I I llll::: I |: h |||| I ||:: | |:|:: 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 176 TLKLSNNALSGLPQGVFGKLGSLQELFIDSN 206 

11:1::: II III : I II 
Qy 61 LLSLYDNRITTITPGAFTTLVSLSTINLLSN 91 



10 



ENTRY 
TITLE 



JC1282 ttype complete 

insulin-like growth factor-binding protein acid labile chain 
precursor - rat 

ORGANISM tformaljiame Rattus norvegicus tcommonjiame Norway rat 

DATE 30-Sep-1993 #sequence_revision 30-Sep-1993 ftext change 

15-Aug-1997 
ACCESSIONS JC1282 
[ERENCE JC1282 
authors Dai, J,; Baxter, R.C. 

'tjournal Biochem. Biophys. Res. Commun. (1992) 188:304-309 
ttitle Molecular cloning of the acid-labile subunit of the rat 

insulin-like growth factor binding protein complex, 
f cross -references MUID: 93038676 
taccession JC1282 

ttmoleculejype mRNA 

firesidues 1-603 ttlabel DAI 

tiexperimental_source liver 

t tnote the authors translated the codon AAG for residue 63 as 

Arg, AAA for residue 205 as Pro and GGT for residue 
260 as Arg 

CLASSIFICATION tsuperfamily leucine-rich alpha -2 -glycoprotein repeat 
homology 

FEATURE 

1-27 tdomain signal sequence istatus predicted f label SIG\ 

28-603 iproduct insulin-like growth factor binding protein, 

acid labile chain tstatus predicted f label MAT 
SUMMARY tlength 603 tmolecular -weight 66811 tchecksum 8075 

Query Match 18.0%; Score 179; DB 2; Length 603; 

Best Local Similarity 37.5%; Pred. No. 1.35e-13; 

Matches 39; Conservative 22; Mismatches 41; Indels 2; Gaps 2; 



Db 



f 



356 GAFSGLFNVAVMNLSGNCLRSLPERVFQG - LDKLHSLHLEHSCLGHVRLHTFAGLSGLRR 414 
HI: : ; |:|| | :: l:| I :| I : :| I ||||||::| 
2 GAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRL 61 

415 LFLRDNSISSIEEQSLAGLSELLELDLTTNRLTHLPRQLFQGLG 458 

I I II l::| :=: I I ::| :| : :| III 
62 LSLYDNRITTITPGAFTTLVSLSTINLLSNPFNC-NCHLGAGLG 104 



RESULT 11 

ENTRY A38971 ttype complete 

TITLE polycystic kidney disease protein 1 precursor - human 

ORGANISM tformaljiame Homo sapiens tcommonjiame man 

DATE 26-Jan-1996 tsequence revision 26-Jan-1996 ttext change 

16-Dec-1998 
ACCESSIONS A38971; A56520; A44604 
REFERENCE A38971 
i authors Harris, P.C. 
tsubmission submitted to GenBank, May 1995 
taccession A38971 
ftmolecule.type mRNA 
ttresidues 1-4302 ttlabel HAR 
ttcross-references GB:L33243; NID:g904222; PID;g904223 
REFERENCE A56520 

fauthors Alexandra Gluecksmann-Kuis, M.; Tayber, O.; Woolf, E.A.; 

Bougueleret, L. ; Deng, N, ; Alperin, G.D.; Iris, F. ; 
Hawkins, F.; Munro, C; Lakey, N.; Duyk, G.; Schneider, 
M.C.; Geng, L,; Zhang, F.; Zhao, Z.; Torosian, S.; Zhou, 



J.; Reeders, S.T.; Bork, P,; Pohischmidt, M.; Loehning, C; 
Kraus, B.; Nowicka, U.; Leung, A.L.S,; Frischauf, A.M. 
tjournal Cell (1995) 81:289-298 

ttitle Polycystic kidney disease: the complete structure of the PKD1 

gene and its protein, 
taccession A56520 

ttstatus preliminary; nucleic acid sequence not shown 

ttmolecule.type mRNA 

ttresidues 1-70, 'E' ,72-137, 'Q' , 139-252, 'A' , 254-301, 'D', 303-690, ' P' , 
692-738, 'R', 740-762, 'G', 764-773, 'QR', 776-791, 'L', 
793-865, 'L' ,867-883, 'A' , 885-1055, T ,1057-1091, T , 
1093-1276, 'G', 1278-1723, 'A', 1725-1975, 'V, 1977-3389, 
'Q' ,3390-3980, 'HV' ,3983-4003, 'HV' ,4006-4302 ttlabel 
ALE 

ttcross-references GB:U24497; NID:g799334; PID:g799335; GB:024499 
iENCE A44604 
tauthors Ward, C.J.; Peral, B.; Hughes, J.; Thomas, S,; Gamble, v.; 

MacCarthy, A.B.; Sloane-Stanley, J.; Buckle, V.J.; Kearney, 
L.; Higgs, D.R.; Ratcliffe, P.J.; Harris, P.C; Roelfsema, 
J.H.; Spruit, L.; Saris, J.J.; Dauwerse, H.G.; Peters, 
D.J.M.; Breuning, M.H.; Nellist, M. ; Brook-Carter, P.T.; 
Maheshwar, M.M.; Cordeiro, I.; Santos, H, ; Cabral, P.; 
Sampson, J.R,; Janssen, B.; Hesseling-Janssen, A.L.W.; van 
den Ouweland, A.M.W.; Eussen, B.; Verhoef, S.; Lindhout, 
D.; Halley, D.J.J, 
tjournal Cell (1994) 77:881-894 

ttitle The polycystic kidney disease 1 gene encodes a 14 kb 
transcript and lies within a duplicated region on 
chromosome 16. 
taccession A44604 
ttstatus significant sequence differences 
ttmolecule_type mRNA 
ttcross-references GB:L33243 
REFERENCE A38972 

tauthors Ward, C.J.; Peral, B.; Hughes, J.; Thomas, S.; Gamble, V.; 

MacCarthy, A.B.; Sloane-Stanley, J.; Buckle, V.J.; Kearney, 
L. ; Higgs, D.R.; Ratcliffe, P.J.; Harris, P.C; Roelfsema, 
J.H.; Spruit, L; Saris, J.J.; Dauwerse, H.G.; Peters, 
D.J.M.; Breuning, M.H.; Nellist, M. ; Brook-Carter, P.T.; 
Maheshwar, M.M.; Cordeiro, I.; Santos, H,; Cabral, P.; 
Sampson, J.R.; Janssen, B,; Hesseling-Janssen, A.L.W,; van 
den Ouweland, A.M.W.; Eussen, B.; Verhoef, s.; Lindhout, 
D.; Halley, D.J, J, 
tjournal Cell (1994) 78:725 
tcontents annotation; erratum 

tnote this is a revision to the sequence from reference A44604 

REFERENCE A56732 

tauthors Ward, C.J.; Peral, B.; Hughes, J.; Thomas, S.'; Gamble, V,; 

MacCarthy, A.B,; Sloane-Stanley, J.; Buckle, V.J.; Kearney, 
L.; Higgs, D.R.; Ratcliffe, P,J,; Harris, P.C; Roelfsema, 
J.H.; Spruit, L.; Saris, J, J.; Dauwerse, H.G.; Peters, 
D.J.M.; Breuning, M.H,; Nellist, M. ; Brook-Carter, P.T.; 
Maheshwar, M.M.; Cordeiro, I.; Santos, H.; Cabral, P.; 
Sampson, J.R.; Janssen, B.; Hesseling-Janssen, A.L.W. ; van 
den Ouweland, A.M.W.; Eussen, B, ; Verhoef, S.; Lindhout, 
D.; Halley, D.J.J. 

tjournal Cell (1995) 81:1171 

tcontents annotation; erratum 

tnote this is a revision to the sequence from reference A44604 

GENETICS 

tgene GDB : PKDl 

ttcross-references GDB:120293; OMIM:173900; OMIM;601313 

tmap_position 16pl3.3-16pl3.3 
CLASSIFICATION tsuperfamily proteoglycan carboxyl-terminal homology 
KEYWORDS duplication 
FEATURE 

1-23 tdomain signal sequence tstatus predicted tlabel SIG\ 

123-170 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS1 

SUMMARY tlength 4302 tmolecular -weight 462436 tchecksum 2698 

Query Match 17,6%; Score 175; DB 2; Length 4302; 

Best Local Similarity 32,2%; Pred. No. 6.33e-13; 
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Matches 39; Conservative 29; Mismatches 47; Indels 6; Gaps 6 

Db 52 SGRGLRTL-GPALRIP-ADATALDVSHNLLRALDVGLLANLSALAELDISNNKISTLEEG 109 

:| 11:1 ::| : :| : II: : :| II:: I : :|:|:|: I 
Qy 16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

Db 110 IFANLFNLSEINLSGNPFECDCGL-AWLPRWAEEQQVRWQPEAATCAGPGSLAGQPLLG 168 

I: I :|| III :KI:|: I I I :| : |:| : : I I I |: I 
Qy 76 AFTTLVSLSTINLLSNPFNCNCHLGAGLGRWLR-KR-RIV-SGNPRCQKPFFLKEIPIQG 132 



Qy 133 V 133 



RESULT 12 

WRY A43318 ttype complete 

jTLE connectin precursor - fruit fly (Drosophila melanogaster) 

fGANlSH ♦formaljiame Drosophila melanogaster 

"Bate 31-Dec-1993 tsequence_revision 31-Dec-1993 ftext.change 

24-Sep-1998 
ACCESSIONS A43318; S28464 
REFERENCE A43318 

tauthors Nose, A.; Mahajan, V.B.; Goodman, C.S. 
♦journal Cell (1992) 70:553-567 

f title Connectin: a hemophilic cell adhesion molecule expressed on a 
subset of muscles and the motoneurons that innervate them 
in Drosophila, 
tcross -references MUID:92370678 
faccession A43318 
ttmoleculejype mRNA 
♦♦residues 1-682 tilabel NOS 
ftcross-references GB:M96647; NID:gl57083; PID:gl57084 
♦♦note sequence extracted from NCBI backbone (NCBIN: 111422, 

NCBIP: 111423) 

REFERENCE S28464 

♦authors Gould, A.P.; White, R.A.H. 
♦submission submitted to the EMBL Data Library, October 1992 
idescription Connectin a target of homeotic gene control in drosophila, 
♦accession S28464 
##molecule_type mRNA 

♦♦residues 1-630, 'G', 632-673, 675-678, 'M', 679-682 U label GOD 
♦♦cross-references EMBL:X68701; NID:g7737; PID:g7738 
GENETICS 

♦gene FlyBase:Con 

♦♦cross-references FlyBase:FBgn0005775 
^B\SSIFICATION ♦superfamily leucine-rich alpha -2 -glycoprotein repeat 
^ homology 
FEATURE 

1-26 tdomain signal sequence ♦status predicted ♦label SIG\ 

27-682 tproduct connectin istatus predicted Uabel MAT 

SUMMARY flength 682 tolecular-weight 75991 ♦checksum 7269 

Query Match 17.3%; Score 172; DB 2; Length 682; 

Best Local Similarity 39.1%; Pred. No. 2.00e-12; 

Matches 36; Conservative 19; Mismatches 34; Indels 3; Gaps 3; 

Db 263 EGLFADMARLTFLNLAHNQINVLTSEIFRG-LGNLNVLKLTRNNL-NFIGDTVFAELWSL 320 

III I : I I: II:: : : III I: |: hi |:|| ::: II I h 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLML-RSNLIGCVSNDTFAGLSSV 59 



Db 321 SELELDDNRIERISERALDGLNTLKTLNLRNN 352 

I I Mil I: I: I :| Ml :| 
Qy 60 RLLSLYDNRITTITPGAFTTLVSLSTINLLSN 91 



RESULT 13 

ENTRY A49121 ttype. complete 

TITLE cell -surface molecule connectin - fruit fly (Drosophila 

melanogaster) 

ORGANISM ♦formal_name Drosophila melanogaster 

DATE 19-Dec-1993 tsequence.revision 18-Nov-1994 itext_change 

20-Mar-1998 



ACCESSIONS A49121 
REFERENCE A49121 

♦authors Gould, A. P.; white, R,A, 

♦journal Development (1992) 116:1163-1174 

♦title Connectin, a target of homeotic gene control in Drosophila. 

♦cross -references MOID; 93202002 

♦accession A49121 

♦♦status preliminary 

t»molecule_type nucleic acid 

♦♦residues 1-682 ♦♦label GOU 

♦♦cross-references GB:X68701; NID:g7737; PID:g7738 

♦♦experimental_source embryo 

♦♦note sequence extracted from NCBI backbone (NCBIN: 127661, 

NCBIP:127664) 

GENETICS 

♦gene FlyBase:Con 
♦♦cross-references FlyBase : FBgn0005775 



CLASSIFICATION ♦superfamily leucine-rich alpha' 
homology 



FEATURE 
199-222 



♦domain leucine-rich alpha' 

homology ♦label LRR1\ 
♦domain leucine-rich alpha' 

homology tlabel LRR2\ 
♦domain leucine-rich alpha' 

homology * label LRR3\ 
♦domain leucine-rich alpha' 

homology Mabel LRR4\ 
♦domain leucine-rich alpha' 

homology tlabel LRR5\ 
♦domain leucine-rich alpha- 

homology tlabel LRR6\ 
♦domain leucine-rich alpha' 

homology #label LRR7 
♦length 682 Smolecular-veight 



2 -glycoprotein repeat 



■2 -glycoprotein repeat 
■2-glycoprotein repeat 
2-glycoprotein repeat 
2-glycoprotein repeat 
2-glycoprotein repeat 
■2-glycoprotein repeat 
2-glycoprotein repeat 
'5922 ♦checksum 7093 



Query Match 17.3%; Score 172; DB 2; Length 682; 

Best Local Similarity 39.1%; Pred. No. 2 .00e-12; 

Matches 36; Conservative 19; Mismatches 34; Indels 3; Gaps 3 

Db 263 EGLFADMARLTFLNLAHNQINVLTSEIFRG-LGNLNVLKLTRNNL-NFIGDTVFAELWSL 320 

III I : I I: II:: : : III I: h hi hll ::: II I h 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLML-RSNLIGCVSNDTFAGLSSV 59 

Db 321 SELELDDNRI ERI SERALDGLNTLKTLNLRNN 352 

I I Mil h h I :| hll :| 
Qy 60 RLLSLYDNRITTITPGAFTTLVSLSTINLLSN 91 



RESULT 14 

ENTRY JC6128 I type complete 

TITLE insulin-like growth factor binding complex acid labile chain 

- mouse 

ORGANISM ♦formal.name Mus musculus ♦commonjiame house mouse 

DATE 23-Mar-1997 ♦sequence revision 09-May-1997 *text change 

10-Sep-1997 
ACCESSIONS JC6128 
REFERENCE JC6128 

♦authors Boisclair, Y.R.; Seto, D.; Hsieh, S.; Hurst, K.R.; Ooi, G.T, 
♦journal Proc. Natl. Acad. Sci. U.S.A. (1996) 93:10028-10033 
♦title Organization and chromosomal localization of the gene 

encoding the mouse acid labile subunit of the insulin-like 
growth factor binding complex, 
♦cross-references MUID: 96413591 
♦accession JC6128 
♦tmoleculejype DNA 
♦♦residues 1-603 Mabel BOI 
♦♦cross-references GB:U66900; NID:gl621612; PID:gl621613 
COMMENT This protein is a serum protein and it is of the ternary complex in 
the physiology of circulating insulin-like growth factor. 

GENETICS 

♦gene als 
♦map_position 17 
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* length 603 tmolecular -weight 66959 tchecksum 7670 



Query Match 16,7%; 
Best Local Similarity 32,1%; 
Matches 34; Conservative 



Score 166; DB 2; Length 603; 
Pred. No. 1.97e-ll; 

27; Mismatches 43; Indels 2; Gaps 2; 



Db 283 EDTFPGLLGLHVLRLAHNAITSLRPRTFKD-LHFLEELQLGHNRIRQLGEKTFEGLGQLE 341 

I :t I ::: | I: | : ::: | |: I I | | | | ;;; || ||; : 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 342 VLTLNDNQIHEVKVGAFFGLFNVAVMNLSGNCLRSLPEHVFQGLGR 387 

:hl Ihl : III I ::: :|| :| : |: III: 
Qy 61 LLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCKC-HLGAGLGK 105 



RESULT 
ENTRY 
TITLE 



A53531 itype complete 
oncofetal trophoblast glycoprotein 514 precursor ■ 



human 



Sessions 
reference 
fauthors 

tjournal 
•title 



ALTERNATEJJAMES oncofetal antigen 5T4 - human 
ORGANISM tformal_name Homo sapiens tcommonjiame man 

•E 27-Jun-1994 (fsequence_revision 27-Jun-1994 ftext_change 

10-Sep-1997 
ESSIONS A53531; S40087 
A53531 

Myers, K.A.; Rahi-Saund, V,; Davison, M.D, ; Young, J.A.; 

Cheater, A, J,; stern, p.l. 
J, Biol, Chem. (1994) 269:9319-9324 
isolation of a cDNA encoding 5T4 oncofetal trophoblast 
glycoprotein, An antigen associated with metastasis 
contains leucine-rich repeats, 
tcross -references MUID: 94179356 
faccession A53531 

ttstatus preliminary 
timolecule.type mRNA 
ifresidues 1-420 t (label MEY 
ttcross-references EMBL:Z29083; NID:g435654; PID:g435655 
CLASSIFICATION fsuperfamily leucine-rich alpha-2-glycoprotein repeat 
homology 

duplication; glycoprotein; transmembrane protein 



KEYWORDS 
FEATURE 
1-31 
32-420 

SUMMARY 



tdomain signal sequence Sstatus predicted ilabel SIG\ 
tproduct oncofetal trophoblast glycoprotein 5T4 istatus 
predicted tlabel MAT 
it length 420 tmolecular -weight 46031 ((checksum 8580 

Query Match 16.6%; Score 165; DB 2; Length 420; 

Best Local Similarity 28.7%; Pred, No. 2.88e-ll; 

Matches 31; Conservative 31; Mismatches 42; Indels 4; Gaps 4; 

A 215 LELASNHFLYLP-RDVLAQLPSLRHLDLSNNSLVSLTYVSFRNLTHLESLHLEDNALKVL 273 
W I I::|:: : I : |::|: I I :| : :: :| |: : MM: 

, Qy 13 LMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTI 72 

Db 274 HNGTLAELQGLPHIRVFLDNNPWVCDCHM-ADMVTWLKETEWQGKDR 320 

I::: I :|: I : I :|| |:||: I : II: :| |: I 
Qy 73 TPGAFTTLVSLST INL - L - SNPFNCNCHLGAGLGKWLRKRRIVSGNPR 118 



Search completed: Fri May 28 08:48:45 1999 
Job time : 52 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

^^rch.pp protein - protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 08:49:03 1999; MasPar time 5.87 Seconds 

665.018 Million cell updates/sec 

Tabular output not generated, 



Title: 

Description: 



MJS-09-191-647-4 
(1-138) from US09191647 .pep 



Perfect Score: 995 
Sequence: 1 EGAFNGAASVQELMLTGNQL CQKPFFLKEIP IQGVGHPG I 138 

Scoring table: 



PAM 150 
Gap 11 

Searched: 77977 seqs, 28268293 residues 

Post-processing: Minimum Match 04 

Listing first 45 summaries 

Database: Swiss -prot37 
l:swissprot 

Statistics: Mean 43.452; Variance 74.748; scale 0.581 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution . 



Query 



NO. 


Score 


Match Length D 


3 ID 


Description 


Pred. No. 


1 


326 


32.8 


1480 


SLIT.DROME 


SLIT PROTEIN PRECURSOR 


5.61e-47 


2 


217 


21.8 


560 


GPVJDMAN 


PLATELET GLYCOPROTEIN 


1.03e-23 


3 


191 


19.2 


361 


CHADJOVIN 


CHONDROADHERIN PRECURS 


1.76e-18 


4 


185 


18.6 


605 


ALS.PAPPA 


INSULIN-LIKE GROWTH FA 


2.67e-17 


5 


184 


18.5 


605 


ALSJUMAN 


INSULIN-LIKE GROWTH FA 


4.20e-17 


6 


183 


18.4 


536 


CBP8JUMAN 


CARBOXYPEPTIDASE.N 83 


6.59e-17 


7 


179 


18.0 


603 


ALS_RAT 


INSULIN-LIKE GROWTH FA 


3.97e-16 


8 


176 


17.7 


567 


GPVJAT 


PLATELET GLYCOPROTEIN 


1.51e-15 


9 


172 


17.3 


682 


C0NN.DR0ME 


CONNECTIN PRECURSOR, 


8.93e-15 


10 


171 


17.2 


567 


GPVJOUSE 


PLATELET GLYCOPROTEIN 


1.39e-14 


11 


167 


16.8 


4303 


PKDlJUMAN 


POLYCYSTIN PRECURSOR ( 


8.06e-14 


12 


166 


16.7 


603 


ALSJOUSE 


INSULIN-LIKE GROWTH FA 


1.25e-13 


13 


158 


15.9 


1115 


GPCRJjYMST 


G- PROTEIN COUPLED RECE 


4.01e-12 


14 


149 


15.0 


1134 


CHAO.DROME 


CHAOPTIN PRECURSOR (PH 


1.85e-10 


15 


144 


14.5 


312 


A2GLJUMAN 


LEUCINE-RICH ALPHA- 2 -G 


1.50e-09 


16 


138 


13.9 


343 


LUM_CHICK 


LUMICAN PRECURSOR (LUM 


1.77e-08 


17 


138 


13,9 


626 


GPBAJUMAN 


PLATELET GLYCOPROTEIN 


1.77e-08 


18 


136 


13.7 


1097 


TOLL.DROME 


TOLL PROTEIN PRECURSOR 


4.00e-08 


19 


135 


13,6 


359 


PGS2JUMAN 


BONE PROTEOGLYCAN II P 


6,00e-08 


20 


134 


13.5 


177 


GPIXJUMAN 


PLATELET GLYCOPROTEIN 


8,99e-08 


21 


134 


13.5 


662 


GARPJOMAN 


GARP PROTEIN PRECURSOR 


8.99e-08 


22 


132 


13,3 


360 


PGS2JOVIN 


BONE PROTEOGLYCAN II P 


2,01e-07 


23 


132 


13.3 


942 


TMK1.ARATH 


PUTATIVE RECEPTOR PROT 


2.01e-07 



24 


130 


13.1 


360 


PGS2_CANFA 


BONE PROTEOGLYCAN II P 


4.47e-07 


25 


129 


13.0 


206 


GPBBJUMAN 


PLATELET GLYCOPROTEIN 


6.65e-07 


26 


126 


12.7 


852 


TRKC_CHICK 


NT -3 GROWTH FACTOR REC 


2.17e-06 


27 


125 


12.6 


368 


PGS1JUMAN 


BONE/CARTILAGE PROTEOG 


3.22e-06 


28 


125 


12,6 


369 


PGSlJQVIN 


BONE/CARTILAGE PROTEOG 


3,22e-06 


29 


125 


12.6 


369 


PGSlJiOUSE 


BONE/CARTILAGE PROTEOG 


3.22e-06 


30 


125 


12.6 


369 


PGS1JAT 


BONE/CARTILAGE PROTEOG 


3.22e-06 


31 


124 


12,5 


354 


PGS2J40USE 


BONE PROTEOGLYCAN II P 


4.75e-06 


32 


. 124 


12.5 


354 


PGS2JAT 


BONE PROTEOGLYCAN II P 


4.75e-06 


33 


123 


12,4 


206 


GPBBJ40USE 


PLATELET GLYCOPROTEIN 


7 .01e-06 


34 


123 


12.4 


369 


PGSl.CANFA 


BONE/CARTILAGE PROTEOG 


7 .01e-06 


35 


122 


12.3 


818 


TRKB.CHICK 


BDNF / NT -3 GROWTH FAC 


1.03e-05 


36 


121 


12.2 


839 


TRKCJUMAN 


NT -3 GROWTH FACTOR REC 


1.52e-05 


37 


120 


12.1 


825 


TRKC.PIG 


NT -3 GROWTH FACTOR REC 


2.23e-05 


38 


119 


12.0 


357 


PGS2.CHICK 


BONE PROTEOGLYCAN II P 


3.27e-05 


39 


118 


11,9 


796 


TRKAJUMAN 


HIGH AFFINITY NERVE GR 


4.78e-05 


40 


118 


11.9 


821 


TRKBJAT 


BDNF / NT -3 GROWTH FAC 


4.78e-05 


41 


118 


11.9 


821 


TRKBJOUSE 


BDNF / NT- 3 GROWTH FAC 


4.78e-05 


42 


117 


11.8 


440 


OMGPJIOUSE 


OLIGODENDROCYTE-MYELIN 


6.99e-05 


43 


115 


11,6 


208 


GPBB_PAPCY 


PLATELET GLYCOPROTEIN 


1.48e-04 


44 


115 


11.6 


864 


TRKC.RAT 


NT-3 GROWTH FACTOR REC 


1.48e-04 


45 


114 


11.5 


440 


OMGPJUMAN 


OLIGODENDROCYTE-MYELIN 


2.16e-04 



RESULT 1 

ID SLIT_DROME STANDARD; PRT; 1480 AA. 

AC P24014; 

DT 01 -MAR- 1992 (REL. 21, CREATED) 

DT 01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE SLIT PROTEIN PRECURSOR. 

GN SLI. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91099665, 

RA ROTHBERG J.M. , JACOBS J.R., GOODMAN C . S , , ARTAVANIS - TSAKONAS S . ; 

RT "Slit: an extracellular protein necessary for development of midline 

RT glia and commissural axon pathways contains both EGF and LRR 

RT domains."; 

RL GENES DEV. 4:2169-2187(1990). 

CC •!• FUNCTION: NECESSARY FOR' DEVELOPMENT OF MIDLINE GLIA AND 

CC COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

CC MATRIX MOLECULES. 

CC -I- TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 
CC EVENTUALLY DISTRIBUTED ALONG THE AXONS, 

CC -!- ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

CC BY 11 AA AT THE C'TERMINUS OF THE LAST EGF REPEAT. 

CC ■!- SIMILARITY: CONTAINS 7 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 22. TWO BLOCK OF 6 LRR'S 

CC AND TWO BLOCKS OF 5 LRR'S. 

CC -!- SIMILARITY: CONTAINS A C 'TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch), 

cc 

DR EMBL; X53959; G8615; -. 

DR PIR; A36665; A36665. 

DR FLYBASE; FBgn0003425; sli. 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF_1; 7. 

DR PROSITE; PS0U85; CTCKJ; 1. 
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DR PROSITE; PS01186; EGFJ; 5. 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PROSITE; PS01225; CTCK 2; 1. 

DR PFAH; PF000Q7; Cysjnot; 1. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PF0QQ54; lamitlin G; 1. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUCINE-REPEAT; DUPLICATION. 



FT 


SIGNAL 


1 


36 






CHAIN 


37 


1480 




FT 


DOMAIN 


70 


104 


fflNQPRVPn H-PTiNifTHr rppton or tup run 


FT 


DOMAIN 


105 


230 




FT 


DOMAIN 


231 


294 






DOMAIN 


295 


326 


CONSERVED N'FLANRING REGION OF THE LRR 


FT 


DOMAIN 


327 


452 


LEUCINE-RICH REPEATS (2ND REGION). 


FT 


DOMAIN 


453 


518 




FT 


DOMAIN 


519 


550 


CONSERVED N'FLANKING REGION OF THE LRR, 


FT 


DOMAIN 


551 


653 


LEUCINE-RICH REPEATS (3RD REGION). 




DOMAIN 


654 


714 


ffl)J<!FRVFT) r-Pl" AMIfTHR PPfTDM OP THF TPD 


■ 


DOMAIN 


715 


746 


CONSERVED N-FLANKING REGION OF THE LRR. 




DOMAIN 


747 


848 


LEUCINE-RICH REPEATS (4TH REGION), 


FT 


DOMAIN 


849 


910 


CONSERVED C-FLANKING REGION OF THE LRR. 


FT 


REPEAT 


105 


115 


LRR 1-1. 


FT 


REPEAT 


116 


139 


LRR 1-2. 


FT 


REPEAT 


140 


163 


LRR 1-3. 


FT 


REPEAT 


164 


187 


LRR 1-4. 


FT 


REPEAT 


188 


211 


LRR 1-5. 


FT 


REPEAT 


212 


230 


LRR 1-6. 


FT 


REPEAT 


327 


337 


LRR 2-1. 


FT 


REPEAT 


338 


361 


LRR 2-2, 


FT 


REPEAT 


362 


385 


LRR 2-3. 


FT 


REPEAT 


386 


409 


LRR 2-4. 


FT 


REPEAT 


410 


433 


LRR 2-5. 


FT 


REPEAT 


434 


452 


LRR 2-6. 


FT 


REPEAT 


551 


562 


LRR 3-1. 


FT 


REPEAT 


563 


586 


LRR 3-2. 


FT 


REPEAT 


587 


610 


LRR 3-3, 


FT 


REPEAT 


611 


634 


LRR 3-4. 


FT 


REPEAT 


635 


653 


LRR 3-5. 


FT 


REPEAT 


747 


757 




FT 


REPEAT 


758 


781 


TRR 


FT 


REPEAT 


782 


805 


LRR 4-3 


FT 


REPEAT 


806 


829 






REPEAT 


830 


848 


LRR 4-5 


FT 


DOMAIN 


907 


944 


Dbf LI Ml JL , 


FT 


DOMAIN 


946 


983 


Dbf LIMj L, 


FT 




985 


1022 


PHP-fTITP *X rRTrTrTV.DTMATVV 1 /nfiTPMTTnf \ 

ml ulNU i i UUA,1UM olNUlNb (PUlfjNUAL) , 


■ 


DOMAIN 


1024 


1062 


Pl"P-TTIfP k 
ml LIKE, 4 . 






1064 


1100 


ml bint D, UUA.IUM tUNDIMj (POTfcNTIAL) . 


w 


UUnn 111 


1111 


1149 


ml Li Ml 0. 


FT 


DOMAIN 


1353 


1392 


FfiF-TTIfF 7 
Hut Li Ml / . 


FT 


DOMAIN 


1409 


1480 


CTCK. 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM), 


FT 


CARBOHYD 


111 


111 


POTENTIAL. 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


FT 


CARBOHYD 


435 


435 


POTENTIAL, 


FT 


CARBOHYD 


783 


783 


POTENTIAL, , 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 


FT 


CARBOHYD 


958 


958 


POTENTIAL, 


FT 


CARBOHYD 


998 


998 


POTENTIAL. 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL, 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL, 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL, 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL, 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISULFID 


916 


932 


BY SIMILARITY. 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DISULFID 


950 


961 


BY SIMILARITY. 


FT 


DISULFID 


955 


971 


BY SIMILARITY. 





nTcm PTn 








OTMTT JlDTfliV 

oiMILAKiii , 


FT 




989 


1001 


nv 


CTVTf 10TTV 

DiMlLAKili , 




nTCrn.FTn 






n 


CTWTT RDTTV 


FT 


UlOULc iD 


1012 


1021 


BY 


CTWTT SDTTV 




UiDULC iLI 


1028 




nv 


CTWTT ADTTV 


FT 


uioULf iu 


1035 


iMn 


BY 


CTMTT ADTTV 


PT 


riicriT pth 




lUbi 


BY 


SIMILARITY, 




nicnt pin 


1068 


1079 


BY 


CTWTT ABTTV 


PT 


nTcniPTn 




1088 


BY 


SIMILARITY. 


PT 


nTcrir PTn 


1090 


1099 


BY 


SIMILARITY. 


PT 


nTOni PTn 




1125 


BY 


SIMILARITY. 


FT 


nTorif cm 


1120 


1137 


BY 


SIMILARITY . 


FT 


DISULFID 


1139 


1148 


BY 


SIMILARITY. 


FT 


DISULFID 


1357 


1368 


BY 


SIMILARITY. 


FT 


DISULFID 


1362 


1380 


BY 


SIMILARITY, 


FT 


DISULFID 


1382 


1391 


BY 


SIMILARITY. 


FT 


DISULFID 


1409 


1443 


BY 


SIMILARITY. 


FT 


DISULFID 


1423 


1457 


BY 


SIMILARITY. 


FT 


DISULFID 


1434 


1473 


BY 


SIMILARITY. 


FT 


DISULFID 


1438 


1475 


BY 


SIMILARITY. 


FT 


DISULFID 


1442 


1479 


BY 


SIMILARITY. 


SQ 


SEQUENCE 


1480 AA; 165752 MW; 


2CD1C421 CRC32; 


Query Match 




32.8* 


Score 326; DB 1; L 



Best Local Similarity 37.5%; Pred. No. 5.61e-47; 
Matches 45; Conservative 35; Mismatches 38; Indels 2; Gaps 2; 

Db 365 ALSGLKQLTTLVLYGNKIKDLPSGVFKG-LGSLRLLLLNANEISCIRKDAFRDLHSLSLL 423 

: hi II : : : hi l::|: hi :l |:|: :|:| I I: II 
Qy 3 AFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLL 62 

Db 424 SLYDNNIQSLANGTFDAMKSMRTVHLAKNPFICDCNLRW-LADYLHKNPIETSGARCESP 482 

Mill I ::: hi':: h |::| III |:|:| |: |:| I :: :||: I 
Qy 63 SLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



RESULT 2 

ID GPVJUMAN STANDARD; PRT; 560 AA. 

AC P40197; 

DT 01-FEB-1995 (REL. 31, CREATED) 

DT 01-FEB-1995 (REL, 31, LAST SEQUENCE UPDATE) 

DT 01-FEB-1995 (REL. 31, LAST ANNOTATION UPDATE) 

DE PLATELET GLYCOPROTEIN V PRECURSOR (GPV) (CD42D) . 

GN GP5. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC TISSUE-LUNG; 

RX MEDLINE; 93391348. 

RA HICKEY M.J., HAGEN F.S., YAGI M., ROTH G.J.; 

RT "Human platelet glycoprotein V: characterization of the polypeptide 

RT and the related ib-V-IX receptor system of adhesive, leucine-rich 

RT glycoproteins."; 

RL' PROC. NATL. ACAD. SCI. U.S.A. 90:8327-8331(1993). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE-PLATELET; 

RX MEDLINE; 94012616. 

RA LANZA F., MORALES M., DE LA SALLE C, CAZENAVE J. -P., CLEMETSON K.J., 

RA SHIMOMURA T., PHILLIPS D.R.; 

RT "Cloning and characterization of the gene encoding the human platelet 

RT glycoprotein V. A member of the leucine-rich glycoprotein family 

RT cleaved during thrombin-induced platelet activation."; 

RL J, BIOL. CHEM. 268:20801-20807(1993). 

RN [3] 

RP PARTIAL SEQUENCE. 

RC TISSUE-PLATELET; 

RX MEDLINE; 90275263. 

RA SHIMOMURA T., FUJIMURA K. , MAEHAMA S., TAKEMOTO M,, ODA K , , 

RA FUJIMOTO T., OYAMA R., SUZUKI M., ICIHARA-TANAKA K., TITANI K,, 

RA KURAMOTO A.; 
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RT "Rapid purification and characterization of human platelet 
RT glycoprotein V: the amino acid sequence contains leucine-rich 
RT repetitive modules as in glycoprotein Ib. n ; 
RL BLOOD 75:2349-2356(1990). 
RN [4] 

RP PARTIAL SEQUENCE. 
RC TISSUE-PLATELET; 
RX MEDLINE; 90321220. 

RA ROTH G.J., CHURCH T,A., MCMULLEN B.A., WILLIAMS S.A.; 

RT "Human platelet glycoprotein V: a surface leucine-rich glycoprotein 

RT related to adhesion,"; 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 170:153-161(1990). 
CC -!- FUNCTION: THE GPIB-V-IX COMPLEX FUNCTIONS AS THE VON WILLEBRAND 
CC FACTOR RECEPTOR AND MEDIATES VON WILLEBRAND FACTOR-DEPENDENT 
CC PLATELET ADHESION TO BLOOD VESSELS. THE ADHESION OF PLATELETS TO 
CC INJURED VASCULAR SURFACES IN THE ARTERIAL CIRCULATION IS A 

•CRITICAL INITIATING EVENT IN HEMOSTASIS. 
-I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
-!- TISSUE SPECIFICITY: PLATELETS AND MEGAKARYOCYTES. 
•!• PTM: THE N-TERMINAL IS BLOCKED, 

■!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS. NUMBER IN THIS PROTEIN: 15. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 



CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

DR EMBL; LU238; G388760; - 
DR EMBL; Z23091; G312502; - 
DR MIM; 173511; -. 
DR PFAM; PF00560; LRR; 8. 
DR HSSP; P16473; 1XUM. 
KW PLATELET; TRANSMEMBRANE; 
KW REPEAT; LEUCINE-REPEAT, 



GLYCOPROTEIN; BLOOD COAGULATION; 
CELL ADHESION; SIGNAL. 



FT 


SIGNAL 


1 


16 


POTENTIAL, 


FT 


CHAIN 


17 


560 


PLATELET GLYCOPROTEIN V. 


FT 


DOMAIN 


17 


523 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


524 


544 


POTENTIAL, 


FT 


DOMAIN 


545 


560 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


55 


415 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


55 


78 


LRR 1, 




REPEAT 


79 


102 


LRR 2, 




REPEAT 


103 


126 


LRR 3, 


w 


REPEAT 


127 


150 


LRR 4. 


FT 


REPEAT 


151 


174 


LRR 5, 


FT 


REPEAT 


175 


198 


LRR 6. 


FT 


REPEAT 


199 


222 


LRR 7. 


FT 


REPEAT 


223 


246 


LRR 8. 


FT 


REPEAT 


247 


270 


LRR 9. 


FT 


REPEAT 


271 


294 


LRR 10. 


FT 


REPEAT 


295 


318 


LRR 11. 


FT 


REPEAT 


319 


343 


LRR 12. 


FT 


REPEAT 


346 


367 


LRR 13. 


FT 


REPEAT 


368 


391 


LRR 14. 


FT 


REPEAT 


392 


415 


LRR 15. 


FT 


CARBOHYD 


51 


51 




FT 


CARBOHYD 


181 


181 




FT 


CARBOHYD 


243 


243 


POTENTIAL. 


FT 


CARBOHYD 


267 


267 


POTENTIAL. 


FT 


CARBOHYD 


298 


298 




FT 


CARBOHYD 


312 


312 




FT 


CARBOHYD 


385 


385 




FT 


CARBOHYD 


499 


499 


POTENTIAL. 


FT 


CONFLICT 


73 


74 


MT •> TK (IN REF. 2). 


FT 


CONFLICT 


109 


109 


K -> T (IN REF. 2), 


FT 


CONFLICT 


130 


130 


D -> W (IN REF. 3). 


FT 


CONFLICT 


136 


138 


GID -> PGG (IN REF. 3). 


FT 


CONFLICT 


209 


209 


L -> I (IN REF. 2). 


FT". 


CONFLICT 


267 


267 


N -> H (IN REF. 3). 



FT CONFLICT 327 327 L -> I (IN REF. 2). 

FT CONFLICT 478 ,478 P -> G (IN REF. 2) . 

FT CONFLICT 509 509 P -> D (IN REF. 2). 

SQ SEQUENCE 560 AA; 60959 MW; FD65EDD2 CRC32; 

Query Match 21.8*; Score 217; DB 1; Length 560; 

Best Local Similarity 34.4%; Pred. No. 1.03e-23; 

Matches 43; Conservative 28; Mismatches 49; Indels 5; Gaps 4 

Db 332 QGAFQGLGELQVIALHSNGLTALPDGLLRG-LGKLRQVSLRRNRLRALPRALFRNLSSLE 390 

:M | : :| | | :| | :: :|| |: |: : IN : :: | |||: 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 391 SVQLDHNQLETLPGDVFGALPRLTEVLLGHNSWRCDCGLGPFLG-WLRQHLGLVGGEEPP 449 

: I I:: I: I :| I: : I I: |:| ||: II III : :|:|: I 
Qy 61 LLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRR- IVSGN- * P 117 

Db 450 RCAGP 454 

II I 

Qy 118 RCQKP 122 



STANDARD; PRT; 361 AA. 



RESULT 3 

ID CHADJOVIN 

AC Q27972; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE CHONDROADHERIN PRECURSOR (CARTILAGE LEUCINE-RICH PROTEIN) (38 KD BONE 

DE PROTEIN) . 

OS BOS TAURUS (BOVINE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; BOVINAE; BOS. 

RN [1] 

RP SEQUENCE FROM N. A., AND PARTIAL SEQUENCE. 

RC TISSUE-CARTILAGE; 

RX MEDLINE; 94342341. 

RA NEAME P.J., SOMMARIN Y. ( BOYNTON R.E., HEINEGARD D.; 

RT "The structure of a 38-kDa leucine-rich protein (chondroadherin) 

RT isolated from bovine cartilage."; 

RL J. BIOL. CHEM. 269:21547-21554(1994). 

RN [2] 

SEQUENCE OF 25-55 AND 77-97. 



RP 
RC 
RX 
RA 
RT 
RT 
RT 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

DR EMBL; U08018; G470672; ■ 
DR PFAM; PF00560; LRR; 5. 
KW REPEAT; SIGNAL. 



MEDLINE; 95113864, 

HO B. # COULSON L., MOYER B., PRICE P. A.; 

"Isolation and molecular cloning of a novel bone p 
related in sequence to the cystatin family of thiol protease 
inhibitors,"; 

J. BIOL, CHEM, 270:431-436(1995). 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) , 



FT SIGNAL 

FT CHAIN 

FT CHAIN 

FT DOMAIN 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 



1 

25 
25 

79 
79 
103 
127 
151 
175 
199 
223 



24 
361 
352 
317 
102 
126 
150 
174 
198 
222 
246 



OR 23 (IN SOME ISOFORM(S)). 



:hondroadherin, minor form. 
x 24 aa leucine-rich tandem repeats. 
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FT REPEAT 


248 


271 


8, 




FT REPEAT 


272 


293 






FT REPEAT 


294 


317 


10. 




FT DISULFID 


306 


348 






FT DISULFID 


308 


328 






FT CONFLICT 


25 


25 


C -> Y (IN RE 


F. 2). 


FT CONFLICT 


29 


29 


C -> W (IN RE 


F. 2). 


FT CONFLICT 


31 


31 


C -> H (IN RE 


F, 2). 


FT CONFLICT 


40 


40 


C ■> L (IN RE 


F, 2). 


FT CONFLICT 


52 


,52 


S -> R (IN RE 


F. 2). 


SQ SEQUENCE 


361 AA; 


40884 


MW; A370BB91 CRC32; 


Query Match 




19.2* 


Score 191; DB 


1; L 



Best Local Similarity 30,1%; Pred. No. 1.76e-18; 
Matches 34; Conservative 31; Mismatches 44; Indels 4; Gaps 4 

Db 223 VEELKLSHNPLKSIPDNAFQSFGRYLETLWLDNTNLEKFSDGAFLGVTTLKHVHLENNRL 282 

1:11:1: I I :: ;|:: I III ; ; |: :| |::::: : I :||: 
3y 10 VQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRI 69 



283 HQL-PSNFP-FDSLETLTLTNNPWKCTCQL-RGLRRWL-EAKTSRPDATCASP 331 

: I: I : II I: I :|| :| |:| II :|| : :: I I 
70 TTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



STANDARD; 



PRT; 605 AA. 



RESULT 4 

ID ALS PAPPA 

AC 002833; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT Ql-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID LABILE CHAIN 

DE PRECURSOR (ALS). 

GN IGFALS OR ALS, 

OS PAPIO PAPIO (GUINEA BABOON) . 

OC EURARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; CERCOPITHECIDAE; CERCOPITHECINAE; PAPIO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-LIVER; 

RX MEDLINE; 97040714. 

RA DELHANTY P., BAXTER R.C.; 

RT "The cloning and expression of the baboon acid-labile subunit of the 

RT insulin-like growth factor binding protein complex."; 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 227:897-902(1996). 

CC -!- FUNCTION; INVOLVED IN PROTEIN* PROTEIN INTERACTIONS THAT RESULT 

CC IN PROTEIN COMPLEXES, RECEPTOR -LIG AND BINDING OR CELL ADHESION. 

CC -!- SUBUNIT; FORMS A TERNARY COMPLEX OF ABOUT 140 TO 150 KD WITH IGF- I 

CC OR IGF-II AND IGFBP-3 (BY SIMILARITY). 

SUBCELLULAR LOCATION; EXTRACELLULAR. 
-!- SIMILARITY; THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

MANY PROTEINS. NUMBER IN THIS PROTEIN; 20, 

CC 

CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute, There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license?isb-sib,ch). 



DR EMBL; S83462; E323796; - 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P23945; 1XUN. 

KW GLYCOPROTEIN; LEUCINE-REPEAT; REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


27 


BY SIMILARITY, 


CC 


FT 


CHAIN 


28 


605 


INSULIN- LIKE GROWTH FACTOR BINDING 


CC 


FT 








PROTEIN, ACID LABILE CHAIN, 


CC 


FT 


DOMAIN 


79 


536 


LEUCINE-RICH REPEATS. 


CC 


FT 


REPEAT 


79 


89 


LRR 1. 


CC 


FT 


REPEAT 


90 


113 


LRR 2. 


CC 


FT 


REPEAT 


114 


137 


LRR 3. 


CC 


FT 


REPEAT 


138 


161 


LRR 4, 


CC 



FT 


KtrtiAl 


162 


185 


LRR 5. 




REPEAT 








FT 


KLrfiftl 






TDD 7* 




KtrMl 


234 




MJO O 




REPEAT 


258 


281 


TDD Q 




REPEAT 




305 


Tnn in 

LRR 10, 


J 




ins 




LRR 11. 




REPEAT 






LRR 12. 


C"P 


REPEAT 


354 


inn 

ill 


LRR 13. 






378 


401 


LRR 14. 




REPEAT 




425 


LRR 15. 


FT 


KLPtAT 


426 


449 


LRR 16. 




REPEAT 




473 


LRR 17, 


FT 


REPEAT 


474 


497 


LRR 18. 


FT 


REPEAT 


498 


521 


LRR 19. 


FT 


REPEAT 


522 


536 


LRR 20. 


FT. 


. CARBOHYD 


64 


64 


POTENTIAL. 


FT 


CARBOHYD 


85 


85 


POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL, 


FT 


CARBOHYD 


368 


368 


POTENTIAL. 


FT 


CARBOHYD 


515 


515 


POTENTIAL. 


FT 


CARBOHYD 


580 


580 


POTENTIAL.- 


SQ 


SEQUENCE 


605 AA; 


66110 MW; 5DF04D42 CRC32 



Query Match 18.6%; Score 185; DB 1; Length 605; 

Best Local Similarity 34.8%; Pred, No, 2,67e-17; 

Matches 31; Conservative 28; Mismatches 27; Indels 3; Gaps 3; 



Db 



187 DAAFRGLGGLRELVLAGNRLAYLQPALF- SGLAELRELDLSRNALRA - IKANVFAQLPRL 244 
::M I ::::lhl:||:| :: I :||: |: I I |: I : : : || |: : 
1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLML-RSNLIGCVSNDTFAGLSSV 59 

245 QKLYLDRNLIAAVAPGAFLGLKALRWLDL 273 

: I I I h:::|lll I :| ::| 
60 RLLSLYDNRITTITPGAFTTLVSLSTINL 88 



ALSJUMAN STANDARD; PRT; 605 AA. 
P35858; 

01-JUN-1994 (REL. 29, CREATED) 

01-JUN-1994 (REL. 29, LAST SEQUENCE UPDATE) 

15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID LABILE CHAIN 

PRECURSOR (ALS). 

IGFALS OR ALS. 

HOMO SAPIENS (HUMAN) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

[1] 

SEQUENCE FROM N.A. , AND PARTIAL SEQUENCE. 

TISSUE-LIVER; 

MEDLINE; 92357025. 

LEONG S.R., BAXTER R.C., CAMERATO T., DAI J., WOOD W.I.; 

"Structure and functional expression of the acid-labile subunit of 
the insulin-like growth factor -binding protein complex."; 
MOL, ENDOCRINOL. 6:870-876(1992). 
[2] 

SEQUENCE OF 28-35. 
MEDLINE; 89308584, 

BAXTER R.C., MARTIN J.L., BENIAC V.A.; 
"High molecular weight insulin-like growth factor binding protein 
complex, Purification and properties of the acid-labile subunit from 
human serum."; 

J. BIOL, CHEM. 264:11843-11848(1989), 

•I- FUNCTION: INVOLVED IN PROTEIN- PROTEIN INTERACTIONS THAT RESULT 
IN PROTEIN COMPLEXES, RECEPTOR -LIGAND BINDING OR CELL ADHESION. 

-!- SUBUNIT: FORMS A TERNARY COMPLEX OF ABOUT 140 TO 150 KD WITH IGF- I 
OR IGF-II AND IGFBP-3. 

-I- SUBCELLULAR LOCATION: EXTRACELLULAR. 

•I- TISSUE SPECIFICITY: PLASMA. 

-I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS. NUMBER IN THIS PROTEIN: 20, 
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cc 










cc 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


cc 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation - 


cc 


the European Bioinformatics Institute. There are no restrictions on its 


cc 


use by 


non-profit institutions as long as its content is in no way 


cc 


modified and this statement 


is not removed. Usage by and for commercial 


cc 


entities requires a license agreement (See http ; //www . isb-sib . ch/announce/ 


cc 
cc 


or send 


n email to licenseGisb-sib.ch) , 


DR 


EMBL; M86826; G184808; -. 




DR 


PIR; A41915; A41915. 




DR 


HIM; 601489; -. 






DR 


PFAM; PF0056Q; LRR; 10. 




DR 


HSSP; P23945; 1XUN. 




KW 


GLYCOPROTEIN; LEUCINE-REPEA 


'; REPEAT; SIGNAL. 


FT 


SIGNAL 


1 


27 




FT 


CHAIN 


28 


605 


INSULIN-LIKE GROWTH FACTOR BINDING 


| 








PROTEIN, ACID LABILE CHAIN. 


1 


DOMAIN 


79 


536 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


79 


89 


LRR 1, 


FT 


REPEAT 


90 


113 


LRR 2, 


FT 


REPEAT 


114 


137 


LRR 3. 


FT 


nLrdnl 


138 


161 


LRR 4, 


FT 


REPEAT 


162 


185 


LRR 5, 


FT 


REPEAT 


186 


209 


LRR 6. 


FT 


REPEAT 


210 


233 


LRR 7, 


FT 


REPEAT 


234 


257 


LRR 8. 


FT 


REPEAT 


258 


281 


LRR 9. 


FT 


REPEAT 


282 


305 


LRR 10. 


FT 


REPEAT 


306 


329 


LRR 11. 


FT 


REPEAT 


330 


353 


LRR 12. 


FT 


REPEAT 


354 


377 


LRR 13. 


FT 


REPEAT 


378 


401 


LRR 14. 


FT 


REPEAT 


402 


425 


LRR 15. 


FT 


REPEAT 


426 


449 


LRR 16. 


FT 


REPEAT 


450 


473 


LRR 17. 


FT 


REPEAT 


474 


497 


LRR 18. 


FT 


REPEAT 


498 


521 


LRR 19. 


FT 


REPEAT 


522 


536 


LRR 20. 


FT 


CARBOHYD 


64 


64 


POTENTIAL. 


FT 


CARBOHYD 


85 


85 


POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL. 


FT 


CARBOHYD 


368 


368 


POTENTIAL. 


FT 


CARBOHYD 


515 


515 


POTENTIAL. 


FT 


CARBOHYD 


580 


580 


POTENTIAL. 




SEQUENCE 


605 AA; 


66034 MW; B5027E19 CRC32; 


Query Match 




18.5*; 


Score 184; DB 1; Length 605; 



Best Local Similarity 34.84; 
31; Conservative 



Pred. No. 4.20e-17; 
27; Mismatches 28; Indels 3; 



Db 187 DAAFRGLGSLRELVLAGNRLAYLQPALF-SGLAELRELDLSRNALRA-IKANVFVQLPRL 244 

I :: I :||: |: I | |: I : : : | |: : 

Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLML-RSNLIGCVSNDTFAGLSSV 59 

Db 245 QKLYLDRNLIAAVAPGAFLGLKALRWLDL 273 

: I I I |::::IHI I :! ::| 
Qy 60 RLLSLYDNRITTITPGAFTTLVSLSTINL 88 



ID CBP8JUMAN STANDARD; PRT; 536 AA. 

AC P22792; 

DT 01-AUG-1991 (REL, 19, CREATED) 

DT Ol-AUG-1991 (REL. 19, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CARBOXYPEPTIDASE N 83 KD CHAIN (CARBOXYPEPTIDASE N REGULATORY 

DE SUBUNIT) (FRAGMENT). 

GN CPN2. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EDTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO, 

RN [1] 



'RP SEQUENCE FROM N.A. 

RC TISSUE-LIVER; 

RX MEDLINE; 90094386. 

RA TAN F,, WEERASINGHE D.K., SKIDGEL R.A., TAMEI H. , KAUL R.K., 

RA RONINSON I.B., SCHILLING J.W., ERDOES E.G.; 

RT "The deduced protein sequence of the human carboxypeptidase N high 

RT molecular weight subunit reveals the presence of leucine-rich tandem 

RT repeats."; 

RL J. BIOL. CHEM. 265:13-19(1990). 

RN [2] 

RP PARTIAL SEQUENCE. 

RX MEDLINE; 88309120. 

RA SKIDGEL R.A., BENNETT CD., SCHILLING J.W., TAN F., WEERASINGHE D.K., 

RA ERDOES E.G.; 

RT "Amino acid sequence of the N-terminus and selected tryptic peptides 

RT of the active subunit of human plasma carboxypeptidase N: comparison 

RT with other carboxypeptidases,"; 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 154:1323-1329(1988). 

CC -!- FUNCTION: THE 83 KD SUBUNIT BINDS AND STABILIZES THE CATALYTIC 

CC SUBUNIT AT 37 DEGREES CELSIUS AND KEEPS IT IN CIRCULATION, UNDER 

CC SOME CIRCUMSTANCES IT MAY BE AN ALLOSTERIC MODIFIER OF THE 

CC CATALYTIC SUBUNIT. 

CC -!■ SUBUNIT: TETRAMER OF TWO CATALYTIC CHAINS AND TWO GLYCOSYLATED 

CC INACTIVE CHAINS. 

CC ■!■ SUBCELLULAR LOCATION: SECRETED. 

CC -!- PTM: O-GLYCOSYLATED IN THE SER/THR-RICH REGION (POTENTIAL). 

CC -!- PTM: WHETHER OR NOT ANY CYS RESIDUES PARTICIPATE IN INTRACHAIN 
CC BONDS IS UNKNOWN, BUT THEY DO NOT FORM INTERCHAIN DISULFIDE BONDS 
CC WITH THE 50 KD CATALYTIC SUBUNIT . 

CC •!• DISEASE: A COMPLETE ABSENCE OF THE ENZYME IS NOT CONSIDERED TO BE 
CC COMPATIBLE WITH LIFE, 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 12. 

CC -!- SIMILARITY: SOME, TO E.COLI YDDK, 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSlsb-sib.ch). 



1 



FT NON.TER 

FT DOMAIN 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT DOMAIN 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

SQ SEQUENCE 536 AA; 58649 MW; C4413E03 CRC32; 



68 


355 


LEUCINE-RICH REPEATS. 


68 


91 


LRR 1. 


92 


115 


LRR 2. 


116 


139 


LRR 3. 


140 


163 


LRR 4, 


164 


187 


LRR 5. 


188 


211 


LRR 6. 


212 


235 


LRR 7. 


236 


259 


LRR 8. 


260 


283 


LRR 9. 


284 


307 


LRR 10. 


308 


331 


LRR 11. 


332 


355 


LRR 12. 


359 


379 


THR/SER-RICH. 


53 


53 


POTENTIAL. 


90 


90 


POTENTIAL. 


98 


98 


POTENTIAL, 


207 


207 


POTENTIAL, 


245 


245 


POTENTIAL, 


327 


327 


POTENTIAL. 


338 


338 


POTENTIAL. 


495 


495 


POTENTIAL. 
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Query Match 18.4%; Score 183; DB 1; Length 536; 

Best Local Similarity 37.4%; Pred. No. 6.59e-17; 

Matches 34; Conservative 22; Mismatches 34; Indels 1; Gaps 1; 

Db 117 EG LFQHLMLES LHLQGNQLQALPRRLFQP - LTHLKTLNLAQNLLAQLPEELFHPLTSLQ 175 

II I I::: I I INI::: I I: I: INI I II:: :::: I hi:; 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 176 TLKLSNNALSGLPQGVFGKLGSLQELFLDSN 206 

I I :| : II III : I II 
Qy 61 LLSLYDNRITTITPGAFTTLVSLSTINLLSN 91 



RESULT 7 

ID ALS.RAT STANDARD; PRT; 603 AA. 

AC P35859; 

DT 01-JUN-1994 (REL. 29, CREATED) 

DT 01-JUN-1994 (REL. 29, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID LABILE CHAIN 

f PRECURSOR (ALS). 
IGFALS OR ALS. 
RATTUS NORVEGICUS (RAT). 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI ; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM H.A. 

RC TISSUE-LIVER; 

RX MEDLINE; 93038676. 

RA DAI J., BAXTER R.C.; 

RT "Molecular cloning of the acid-labile subunit of the rat insulin-like 

RT growth factor binding protein complex."; 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 188:304-309(1992). 

RN [2] 

RP SEQUENCE OF 24-44, AND CHARACTERIZATION. 

RC STRAIN-WISTAR; TISSUE-SERUM; 

RX MEDLINE; 94130835. 

RA BAXTER R.C., DAI J.; 

RT "Purification and characterization of the acid-labile subunit of rat 

RT serum insulin-like growth factor binding protein complex."; 

RL ENDOCRINOLOGY 134:848-852(1994). 

CC •!- FUNCTION: MAY HAVE AN IMPORTANT ROLE IN REGULATING THE ACCESS OF 
CC CIRCULATING IGFS TO THE TISSUES. 

CC •!- SUBUNIT: FORMS A TERNARY COMPLEX OF ABOUT 140 TO 150 KD WITH IGP-I 

CC OR IGF- II AND IGFBP-3. 

CC -!- SUBCELLULAR LOCATION; EXTRACELLULAR. 

CC -!- TISSUE SPECIFICITY: BRAIN, KIDNEY, LUNG, HEART, SPLEEN, MUSCLE 
CC AND LIVER. 

CC -!- SIMILARITY; THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
A MANY PROTEINS. NUMBER IN THIS PROTEIN: 20, 



cc This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch) , 



DR EMBL; S46785; E64972; -. 

DR PIR; JC1282; JC1282. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P23945; 1XUN. 

KW GLYCOPROTEIN; LEUCINE-REPEAT; REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


23 




FT 


CHAIN 


24 


603 


INSULIN-LIKE GROWTH FACTOR BINDING 


FT 








PROTEIN, ACID LABILE CHAIN, 


FT 


DOMAIN 


79 


535 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


79 


89 


LRR 1. 


FT 


REPEAT 


90 


113 


LRR 2. 


FT 


REPEAT 


114 


137 


LRR 3. 


FT 


REPEAT 


138 


161 


LRR 4. 


FT 


REPEAT 


162 


185 


LRR 5. 



e3 


REPEAT 


186 


209 


LRR 6, 


j 


KbrtAI 


10? 




LRR 7. 


•3 


DUDTTRrri 

KbrhAi 




257 


LRR 8. 




KfcrWU 


J: 


281 


LRR 9. 


m 


KbrbAl 




305 


LRR 10, 


FT 

Jr 


KLrtifll 


w. 




LRR 11. 






Tin 


OKO 






KEirEiAl 


354 


m 


roo io 
LRR 13. 


FT 


REPEAT 


378 


401 


LRR 14. 


FT 


REPEAT 


402 


425 


LRR 15, 


FT 


REPEAT 


426 


449 


LRR 16. 




KtFLAl 




473 


LRR 17. 


FT 


REPEAT 


474 


497 




FT 


REPEAT 


498 


521 


LRR 19. 


FT 


REPEAT 


522 


535 


LRR 20. 


FT 


CARBOHYD 


64 


64 


POTENTIAL. 


FT 


CARBOHYD 


85 


85 


POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL. 


FT 


CARBOHYD 


368 


368 


POTENTIAL, 


FT 


CARBOHYD 


515. 


515 


POTENTIAL. 


FT 


CARBOHYD 


578 


578 


POTENTIAL, 


FT 


CARBOHYD 


586 


586 


POTENTIAL, 



SQ SEQUENCE 603 AA; 66811 MW; 5BB22D53 CRC32; 

Query Match 18.0%; Score 179; DB 1; Length 603; 

Best Local Similarity 37,5%; Pred. No. 3.97e-16; 

Matches 39; Conservative 22; Mismatches 41; indels 2; Gaps 2; 

Db 356 GAFSGLFNVAVMNLSGNCLRSLPERVFQG-LDKLHSLHLEHSCLGHVRLHTFAGLSGLRR 414 

IM :| : hi I :: I hi! Ml : :| | ||||||::| 
Qy 2 GAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRL 61 

Db 415 LFLRDNSISSIEEQSLAGLSELLELDLTTNRLTHLPRQLFQGLG 458 

I I II I::| ::: I I ::| :| : :| III 
Qy 62 LSLYDNRITTITPGAFTTLVSLSTINLLSNPFNC-NCHLGAGLG 104 



RESULT 8 

ID GPV.RAT STANDARD; PRT; 567 AA. 

AC 008770; 

DT 15-JUL-1998 (REL, 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE PLATELET GLYCOPROTEIN V PRECURSOR (GPV) (CD42D) . 

GN GP5. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-WISTAR; TISSUE-LIVER; 

RX MEDLINE; 97275136. 

RA RAVANAT C, MORALES M., AZORSA D.O., MOOG S., SCHUHLER S., 

RA GRUNERT P., LOEW D, , VAN DORSSELAER A., CAZENAVE J. -P., LANZA F,; 

RT "Gene cloning of rat and mouse platelet glycoprotein V: 

RT identification of megakaryocyte- specific promoters and demonstration 

RT of functional thrombin cleavage,"; 

RL' BLOOD 89:3253-3262(1997). 

CC -!- FUNCTION; THE GPIB-V-IX COMPLEX FUNCTIONS AS THE VON WILLEBRAND 

CC FACTOR RECEPTOR AND MEDIATES VON WILLEBRAND FACTOR- DEPENDENT 

CC PLATELET ADHESION TO BLOOD VESSELS. THE ADHESION OF PLATELETS TO 

CC INJURED VASCULAR SURFACES IN THE ARTERIAL CIRCULATION IS A 

CC CRITICAL INITIATING EVENT IN HEMOSTASIS (BY SIMILARITY) . 

CC •!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 15. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
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1 


16 


POTENTIAL, 


17 


567 


PLATELET GLYCOPROTEIN V. 


17 


522 


EXTRACELLULAR (POTENTIAL) . 


523 


543 


POTENTIAL. 


544 


567 


CYTOPLASMIC (POTENTIAL). 


55 


415 


LEUCINE-RICH REPEATS. 


55 


78 


LRR 1. 


79 


102 


LRR 2. 


103 


126 


LRR 3. 


127 


150 


LRR 4. 


151 


174 


LRR 5. 


175 


198 


LRR 6. 


199 


222 


LRR 7. 


223 


246 


LRR 8. 


247 


270 


LRR 9. 


271 


294 


LRR 10. 


295 


318 


LRR 11. 


319 


343 


LRR 12. 


346 


367 


LRR 13. 


368 


391 


LRR 14, 


392 


415 


LRR 15. 


51 


51 


POTENTIAL, 


181 


181 


POTENTIAL. 


243 


243 


POTENTIAL. 


298 


298 


POTENTIAL. 


312 


312 


POTENTIAL. 


385 


385 


POTENTIAL. 


498 


498 


POTENTIAL. 



CC or send an email to licenseGisb-sib.ch). 

CC 

DR EMBL; Z69594; E222201; -. 

DR PFAM; PF00560; LRR; 8. 

KW PLATELET; TRANSMEMBRANE; GLYCOPROTEIN; BLOOD COAGULATION; 

KW REPEAT; LEUCINE-REPEAT; CELL ADHESION; SIGNAL, 

FT SIGNAL 

FT CHAIN 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT DOMAIN 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

M REPEAT 

A REPEAT 

V REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

SQ SEQUENCE 567 AA; 63344 MW; ABAEC91D CRC32; 

Query Match 17.7%; Score 176; DB 1; Length 567; 

Best Local Similarity 31.2%; Pred. No. 1.51e-15; 

Matches 34; Conservative 26; Mismatches 47; Indels 2; Gaps 2; 

Db 333 GMFHGLTELRVLAVHTNALEELPEDALRG-LGRLRQVSLRHNRLRALPRTLFRNLSSLVT 391 

I 1:1 : :: I : III: ::|| h |: : II I : :: I III: 
Qy 2 GAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRL 61 

Db 392 VQLEHNQLKTLPGDVFAALPQLTRVLLGHNPWLCDCGLWPFL-QKLRHH 439 
m : I I:: I: |::| |: : I II |:| I : I III : 
H 62 LSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKR 110 

RESULT 9 

ID CONNJ3ROME STANDARD; PRT; 682 AA. 

AC Q01819; 

DT 01-OCT-1993 (REL, 27, CREATED) 

DT 01-OCT-1993 (REL. 27, LAST SEQUENCE UPDATE) 

DT 01-OCM996 (REL. 34, LAST ANNOTATION UPDATE) 

DE CONNECTIN PRECURSOR. 

GN CON, 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEX APOD A; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHIIA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 92370678. 

RA NOSE A,, MAHAJAN V.B., GOODMAN C.S.; 

RT "Connectin: a homophilic cell adhesion molecule expressed on a subset 

RT of muscles and the motoneurons that innervate them in Drosophila."; 

RL CELL 70:553-567(1992) . 

RN [2] 

RP SEQUENCE FROM N.A, 

RC STRAIN a OREGON*R; 

RX MEDLINE; 93202002. 
RA, . GOULD A.P., WHITE R.A.H.; 



RT "Connectin, a target of homeotic gene control in Drosophila."; 
RL DEVELOPMENT 116:1163-1174(1992). 

CC -I- FUNCTION: CELL ADHESION PROTEIN INVOLVED IN TARGET RECOGNITION 
CC DURING NEUROMUSCULAR DEVELOPMENT. MEDIATES HOMOPHILIC CELLULAR 
CC ADHESION. 

CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI -ANCHOR. 
CC -!■ TISSUE SPECIFICITY: PREDOMINANTLY EXPRESSED IN ABDOMINAL AND 
CC THORACIC SEGMENT MUSCLE AND MOTORNEURON CELLS. 
CC -I- DEVELOPMENTAL STAGE: EMBRYO. 

CC -I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 10. 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 

EMBL; M96647; G157084; -. 
EMBL; X68701; G7738; -. 
PIR; S28464; S28464. 
PIR; A43318; A43318. 
FLYBASE; FBgnO005775; Con. 
PFAM; PF00560; LRR; 5. 

CELL ADHESION; DEVELOPMENTAL PROTEIN; EMBRYO; SIGNAL; GPI-ANCHOR; 
LEUCINE-REPEAT; REPEAT. 



FT 


SIGNAL 


1 


24 




FT 


CHAIN 


25 


665 


CONNECTIN. 


FT 


PROPEP 


666 


682 


REMOVED IN MATURE FORM (POTENTIAL) 


FT 


DOMAIN 


142 


381 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


142 


165 


LRR 1. 


FT 


REPEAT 


166 


189 


LRR 2. 


FT 


REPEAT 


190 


213 


LRR 3. 


FT 


REPEAT 


214 


237 


LRR 4. 


FT 


REPEAT 


238 


261 


LRR 5. 


FT 


REPEAT 


262 


285 


LRR 6. 


FT 


REPEAT 


286 


299 


LRR 7. 


FT 


REPEAT 


300 


322 


LRR 8. 


FT 


REPEAT 


324 


347 


LRR 9. 


FT 


REPEAT 


348 


381 


LRR 10. 


FT 


LIPID 


665 


665 


GPI-ANCHOR (POTENTIAL). 


FT 


CONFLICT 


631 


631 


E -> G (IN REF. 2). 


FT 


CONFLICT 


674 


677 


QVAL -> VALM (IN REF. 2). 


SQ 


SEQUENCE 


682 AA; 


75992 MW; 3E15592A CRC32; 



Query Match 17.3%; Score 172; DB 1; Length 682; 

Best Local Similarity 39.1%; Pred. No. 8.93e-15; 

Matches 36; Conservative 19; Mismatches 34; Indels 3; Gaps 3; 

Db 263 EGLFADMARLTFLNLAHNQINVLTSEIFRG-LGNLNVLKLTRNNL-NFIGDTVFAELWSL 320 

III I : I I: II:: : : III |: |: |:| |:|| ::: II I |: 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLML-RSNLIGCVSNDTFAGLSSV 59 

Db 321 SELELDDNRIERISERALDGLNTLKTLNLRNN 352 

I I Mil I: I: I :| Ml :| 
Qy 60 RLLSLYDNRITTITPGAFTTLVSLSTINLLSN 91 



RESULT 10 

ID GPVJOUSE STANDARD; PRT; 567 AA. 

AC 008742; 

DT 15-JUL-1998 (REL. 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE PLATELET GLYCOPROTEIN V PRECURSOR (GPV) (CD42D) , 

GN GPS. 

OS MUS MUSCULUS (MOUSE) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 



tl] 

RP SEQUENCE FROM N.A, 
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RC STRAIN-C57BL/6; TISSUE-LIVER; 

RX MEDLINE; 97275136, 

RA RAVANAT C, MORALES M,, AZORSA D.O., MOOG S., SCHOHLER S,, 

RA GRUNERT P., LOEW D., VAN DORSSELAER A . , CAZENAVE J. -P., LANZA F , ; 

RT "Gene cloning of rat and mouse platelet glycoprotein V: 

RT identification of megakaryocyte-specific promoters and demonstration 

RT of functional thrombin cleavage,"; 

RL BLOOD 89:3253-3262(1997). 

CC ■!• FUNCTION; THE GPIB-V-IX COMPLEX FUNCTIONS AS THE VON WILLEBRAND 
CC FACTOR RECEPTOR AND MEDIATES VON WILLEBRAND FACTOR- DEPENDENT 
CC PLATELET ADHESION TO BLOOD VESSELS. THE ADHESION OF 'PLATELETS TO 
CC INJURED VASCULAR SURFACES IN THE ARTERIAL CIRCULATION IS A 
CC CRITICAL INITIATING EVENT IN HEMOSTASIS (BY SIMILARITY) , 

CC ■!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN, 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 15, 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 

CC the European Bioinformatics Institute, There are no restrictions on its 

»use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license?isb-sib.ch). 

CC 

DR EMBL; Z69595; E222202; -. 

DR MGD; MGI: 1096363; GP5. 

DR PFAM; PF00560; LRR; 7. 

KW PLATELET; TRANSMEMBRANE; GLYCOPROTEIN; BLOOD COAGULATION; 

KW REPEAT; LEUCINE-REPEAT; CELL ADHESION; SIGNAL. 



FT 


SIGNAL 


1 


16 


POTENTIAL. 


FT 


CHAIN 


17 


567 


PLATELET GLYCOPROTEIN V. 


FT 


DOMAIN 


17 


522 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


523 


543 


POTENTIAL. 


FT 


DOMAIN 


544 


567 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


55 


415 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


55 


78 


LRR 1, 


FT 


REPEAT 


79 


102 


LRR 2. 


FT 


REPEAT 


103 


126 


LRR 3, 


FT 


REPEAT 


127 


150 


LRR 4. 


FT 


REPEAT 


151 


174 


LRR 5, 


FT 


REPEAT 


175 


198 


LRR 6. 


FT 


REPEAT 


199 


222 


LRR 7. 


FT 


REPEAT 


223 


246 


LRR 8. 


FT 


REPEAT 


247 


270 


LRR 9, 


FT 


REPEAT 


271 


294 


LRR 10. 


FT 


REPEAT 


295 


318 


LRR 11. 


FT 


REPEAT 


319 


343 


LRR 12. 




REPEAT 


346 


367 


LRR 13. 


i 


REPEAT 


368 


391 


LRR 14. 




REPEAT 


392 


415 


LRR 15. 




CARBOHYD 


51 


51 


POTENTIAL. 


FT 


CARBOHYD 


67 


67 


POTENTIAL. 


FT 


CARBOHYD 


181 


181 


POTENTIAL. 


FT 


CARBOHYD 


243 


243 


POTENTIAL. 


FT 


CARBOHYD 


298 


298 


POTENTIAL. 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 


FT 


CARBOHYD 


385 


385 


POTENTIAL. 


SQ 


SEQUENCE 


567 AA; 


63467 MW; 3AE7515E CRC32; 



Query Match 17, 21; Score 171; DB 1; Length 567; 

Best Local Similarity 33.7*; Pred. No, 1.39e-14; 

Matches 29; Conservative 21; Mismatches 34; Indels 2; Gaps 2; 

Db 286 FGEMAGLRELWLNGTHLSTLPAAAFRN-LSGLQTLGLTRNPRLSALPRGVFQGLRELRVL 344 

I |:::M I I :| |: : :H Nil II I I: :: I II :|:| 
Qy 4 FNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLML-RSNLIGCVSNDTFAGLSSVRLL 62 

Db 345 GLHTNALAELRDDALRGLGHLRQVSL 370 

:| I :: : h I I ::| 
Qy 63 SLYDNRITTITPGAFTTLVSLSTINL 88 



RESULT 11 

ID PKD1 HUMAN STANDARD; PRT; 4303 AA. 

AC P98161; 

DT 01-OCT-1996 (REL. 34, CREATED) 

DT 01-OCM996 (REL. 34, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE POLYCYSTIN PRECURSOR (AUTOSOMAL DOMINANT POLYCYSTIC KIDNEY DISEASE 

DE PROTEIN 1). 

GN PRD1. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 95254638. 

RA GLUECKSMANN-KUIS M.A., TAYBER O., WOOLF E.A., BOUGUELERET L., 

RA DENG N., ALPERIN G.D., IRIS P., HAWKINS F., MUNRO C, LAKEY N., 

RA DUYK G,, SCHNEIDER M.C., GENG L,, ZHANG F, , ZHAO Z,, TOROSIAN S,, 

RA REEDERS S.T., BORK P., POHLSCHMIDT M,, LOEHNING C, RRAUS B., 

RA NOWICKA U., LEUNG A.L.S., FRISCHAUF A.-M.; 

RT "Polycystic kidney disease: the complete structure of the PKD1 gene 

RT and its protein, The International Polycystic Kidney Disease 

RT Consortium."; 

RL' CELL 81:289-298(1995). 

RN [2] 

RP SEQUENCE OF 2769-4303 FROM N.A. 

RX MEDLINE; 94273192. 

RA WARD C.J., PERAL B., HUGHES J., THOMAS S,, GAMBLE V., 

RA MACCARTHY A.B., SLOANE- STANLEY J., BUCKLE V.J., KEARNEY L, , 

RA HIGGS D.R., RATCLIFFE P.J., HARRIS P;C, ROELFSEMA J.H., 

RA SPRUIT L.L., SARIS J.J., DAUWERSE H.G., PETERS D.J.M., 

RA BREUNING M.H., NELLIST M., BROOK-CARTER P.T., MAHESHWAR M.M., 

RA CORDEIRO I., SANTOS H., CABRAL P., SAMPSON J.R., JANSSEN B,, 

RA HESSELING-JANSSEN A.L.W., VAN DEN OUWELAND A.M.W., EUSSEN B. , 

RA VERHOEF S., LINDHOUT D., HALLEY D.J.J. ; 

RT "The polycystic kidney disease 1 gene encodes a 14 kb transcript and 

RT lies within a duplicated region on chromosome 16. The European 

RT Polycystic Kidney Disease Consortium."; 

RL CELL 77:881-894(1994). 

RN [3] 

RP VARIANT ADPKD 3748-ARG-VAL-3752 DEL, AND VARIANT ASP-3632. 

RX MEDLINE; 96108969, 

RA PERAL B., SAN MILLAN J.L., ONG A.C.M., GAMBLE V., WARD C.J., 

RA STRONG C, HARRIS P.C.; 

RT "Screening the 3' region of the polycystic kidney disease 1 (PKD1) 

RT gene reveals six novel mutations . " ; 

RL AM. J. HUM. GENET. 58:86-96(1996). 

RN [4] 

RP VARIANT ALA- 4 058. 

RX MEDLINE; 97295081. 

RA CONSTANT INIDES R., XENOPHONTOS S., NEOPHYTOU P., NOMURA S., 

RA PIERIDES A., CONSTANTINOU-DELTAS CD.; 

RT "New amino acid polymorphism, Ala/Val4058, in exon 45 of the 

RT polycystic kidney disease 1 gene: evolution of alleles,"; 

RL HUM, GENET. 99:644-647(1997). 

RN [5] 

RP VARIANTS T-2760; P-2761 ; V-2763 ; T-2764 ;Q-2791 ; T-2B26 ; L-3008 S L-3064, 

RX MEDLINE; 97449169.. 

RA WATNICK T.J., PIONTEK K.B., CORDAL T.M., WEBER H,, GANDOLPH M.A., 

RA QIAN P., LENS X.M., NEUMANN H.P.H., GERMINO G.G.; 

RT "An unusual pattern of mutation in the duplicated portion of PKDl is 

RT revealed by use of a novel strategy for mutation detection."; 

RL HUM. MOL. GENET, 6:1473-1481(1997), 

RN [6] 

RP VARIANT ADPKD THR-3678. 

RX MEDLINE; 97403939. 

RA TURCO A.E., ROSSETTI S., BRESIN E., ENGLISCH S., CORRA S,, 

RA PIGNATTI P.F.; 

RI "Three novel mutations of the PKDl gene in Italian families with 

RT autosomal dominant polycystic kidney disease."; 

RL HUM, MUTAT, 10:164-167(1997), 

RN [7] 

RP VARIANT ADPKD ASP-4032, AND VARIANT VAL-4045, 
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RX MEDLINE; 98180892. 

RA DANIELLS C, MAHESHWAR M. , LAZAROU L., DAVIES F., COLES G,, RAVINE D,; 

RT "Novel and recurrent mutations in the PKDl (polycystic kidney 

RT disease) gene."; 

RL HUM, GENET. 102 : 216-220(1998) . 

CC -!- FUNCTION: INVOLVED IN ADHESIVE PROTEIN-PROTEIN AND PROTEIN- 
CC CARBOHYDRATE INTERACTIONS. 

CC -I- DISEASE: DEFECTS IN PKDl ARE THE CAUSE OF AUTOSOMAL DOMINANT 
CC POLYCYSTIC KIDNEY DISEASE (ADPKD), A COMMON AUTOSOMAL DOMINANT 
CC GENETIC DISEASE AFFECTING ABOUT 1 OUT 1000 INDIVIDUALS. IT IS 
CC CHARACTERIZED BY PROGRESSIVE FORMATION AND ENLARGEMENT OF CYSTS IN 
CC BOTH KIDNEYS, TYPICALLY LEADING TO END -STAGE RENAL DISEASE IN 
CC ADULT LIFE. CYSTS ALSO OCCURS IN THE LIVER AND OTHER ORGANS. 

CC •!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 2, FRAMED BY A 
CC LRR N- FLANK AND LRR C' FLANK. 

^C '-!• SIMILARITY: CONTAINS 14 POLYCYSTIC KIDNEY DISEASE DOMAINS (PKD). 

M -!• SIMILARITY: CONTAINS 1 LDL-RECEPTOR CLASS A DOMAIN (ATYPICAL, 
W THE POTENTIAL CALCIUM-BINDING SITE IS MISSING) . 

TC -!- SIMILARITY: CONTAINS 1 C-TYPE LECTIN FAMILY DOMAIN. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; U24497; G799335; -. 

DR MIM; 601313; -. 

DR MIM; 173900; -. 

DR PROSITE; PS50041; C_TYPE_LECTINJ; 1. 

DR PFAM; PF00560; LRR; 2. 

DR PFAM; PF00801; PKD; 16. 

KW SIGNAL; LEUCINE-REPEAT; LECTIN; REPEAT; GLYCOPROTEIN; TRANSMEMBRANE; 

KW DISEASE MUTATION; POLYMORPHISM, 



FT 


SIGNAL 


1 


23 


POTENTIAL, 


FT 


CHAIN 


24 


4303 


POLYCYSTIN. 


FT 


DOMAIN 


638 


671 


LDL-RECEPTOR CLASS A. 


FT 


DOMAIN 


32 


65 


LRR N-FLANK. 


FT 


DOMAIN 


72 


119 


LEUCINE-RICH REPEATS. 


FT- 


REPEAT 


72 


94 


LRR 1. 


FT 


REPEAT 


97 


119 


LRR 2. 


FT 


DOMAIN 


123 


177 


LRR C-FLANK. 




REPEAT 


282 


353 


PKD 1. 


1 


DOMAIN 


405 


534 


C-TYPE LECTIN. 




REPEAT 


844 


929 


PKD 2. 


FT 


REPEAT 


930 


1014 


PKD 3. 


FT 


DOMAIN 


1032 


2142 


13 X 80 AA REPEATS. 


FT 


REPEAT 


1032 


1124 


PKD 4. 


FT 


REPEAT 


1138 


1209 


PKD 5. 


FT 


REPEAT 


1221 


1292 


PKD 6. 


FT 


REPEAT 


1305 


1377 


PKD 7. 


FT 


REPEAT 


1390 


1463 


PKD 8. 


FT 


REPEAT 


1477 


1545 


PKD 9. 


FT 


REPEAT 


1559 


1629 


PKD 10, 


FT 


REPEAT 


1643 


1715 


PKD 11. 


FT 


REPEAT 


1729 


1799 


PKD 12. 


FT 


REPEAT 


■1815 


1883 


PKD 13. 


FT 


REPEAT 


1898 


1968 


PKD 14. 


FT 


REPEAT 


1983 


2058 


PKD 15. 


FT 


REPEAT 


2071 


2142 


PKD 16, 


FT 


TRANSMEM 


2216 


2236 


POTENTIAL, 


FT 


TRANSMEM 


2546 


2566 


POTENTIAL, 


FT 


TRANSMEM 


2580 


2600 


POTENTIAL. 


FT 


TRANSMEM 


2693 


2713 


POTENTIAL. 


FT 


TRANSMEM 


2808 


2828 


POTENTIAL. 


FT 


TRANSMEM 


3075 


3095 


POTENTIAL. 


FT 


TRANSMEM 


3283 


3303 


POTENTIAL. 


FT 


TRANSMEM 


3324 


3344 


POTENTIAL. 


FT 


TRANSMEM 


3560 


3580 


POTENTIAL. 


FT, 


TRANSMEM 


3583 


3603 


POTENTIAL. 



FT 


TRANSMEM 


3674 


3694 


POTENTIAL. 


FT 


TRANSMEM 


3897 


3917 


POTENTIAL, 


FT 


TRANSMEM 


3939 


3959 


POTENTIAL. 


FT 


TRANSMEM 


3980 


4000 


POTENTIAL. 


FT 


TRANSMEM 


4028 


4048 


POTENTIAL, 


FT 


TRANSMEM 


4055 


4075 


POTENTIAL. 


FT 


TRANSMEM 


4086 


4106 


POTENTIAL. 


FT 


DISULFID 


640 


653 


BY SIMILARITY. 


FT 


DISULFID 


647 


665 


BY SIMILARITY. 


FT 


DISULFID 


660 


669 


BY SIMILARITY. 


FT 


CARBOHYD 


50 


50 


POTENTIAL. 


FT 


CARBOHYD 


89 


89 


POTENTIAL. 


FT 


CARBOHYD 


116 


116 


POTENTIAL. 


FT 


CARBOHYD 


121 


121 


POTENTIAL. 


FT 


CARBOHYD 


187 


187 


POTENTIAL. 


FT 


CARBOHYD 


621 


621 


POTENTIAL. 


FT 


CARBOHYD 


632 


632 


POTENTIAL. , 


FT 


CARBOHYD 


746 


746 


POTENTIAL. 


FT 


CARBOHYD 


810 


810 


POTENTIAL. 


FT 


CARBOHYD 


841 


841 


POTENTIAL. 


FT 


CARBOHYD 


.854 


854 


POTENTIAL, 


FT 


CARBOHYD 


890 


890 


POTENTIAL. 


FT 


CARBOHYD 


921 


921 


POTENTIAL. 


FT 


CARBOHYD 


1004 


1004 


POTENTIAL, 


FT 


CARBOHYD 


1010 


1010 


POTENTIAL. 


FT 


CARBOHYD 


1034 


1034 


POTENTIAL, 


FT 


CARBOHYD 


1072 


1072 


POTENTIAL. 


FT 


CARBOHYD 


1090 


1090 


POTENTIAL, 


FT 


CARBOHYD 


1113 


1113 


POTENTIAL. 


FT 


CARBOHYD 


1178 


1178 


POTENTIAL. 


FT 


CARBOHYD 


1194 


1194 


POTENTIAL, 


FT 


CARBOHYD 


1240 


1240 


POTENTIAL. 


FT 


CARBOHYD 


1269 


1269 


POTENTIAL. 


FT 


CARBOHYD 


1336 


1336 


POTENTIAL. 


FT 


CARBOHYD 


1348 


1348 


POTENTIAL. 


FT 


CARBOHYD 


1382 


1382 


POTENTIAL. 


FT 


CARBOHYD 


1450 


1450 


POTENTIAL. 


FT 


CARBOHYD 


1455 


1455 


POTENTIAL, 


FT 


CARBOHYD 


1474 


1474 


POTENTIAL. . 


FT 


CARBOHYD 


1518 


1518 


POTENTIAL. 


FT 


CARBOHYD 


1541 


1541 


POTENTIAL. 


FT 


CARBOHYD 


1554 


1554 


POTENTIAL. 


FT 


CARBOHYD 


1563 


1563 


POTENTIAL. 


FT 


CARBOHYD 


1647 


1647 


POTENTIAL. 


FT 


CARBOHYD 


1661 


1661 


POTENTIAL. 


FT 


CARBOHYD 


1733 


1733 


POTENTIAL. 


FT 


CARBOHYD 


1791 


1791 


POTENTIAL. 


FT 


CARBOHYD 


1834 


1834 


POTENTIAL. 


FT 


CARBOHYD 


1867 


1867 


POTENTIAL. 


FT 


CARBOHYD 


1880 


1880 


POTENTIAL. 


FT 


CARBOHYD 


1991 


1991 


POTENTIAL. 


FT 


CARBOHYD 


2050 


2050 


POTENTIAL. 


FT' 


CARBOHYD 


2074 


2074 


POTENTIAL. 


FT 


CARBOHYD 


2125 


2125 


POTENTIAL. 


FT 


CARBOHYD 


2248 


2248 


POTENTIAL. 



Note; remainder of annotations omitted. 

Query Match 16.8%; Score 167; DB 1; Length 4303; 

Best Local Similarity 32.2%; Pred. No. 8.06e-14; 

Matches 39; Conservative 27; Mismatches 49; Indels 6; Gaps 6; 

Db 52 SGRGLRTL-GPALRIP-ADATELDVSHNLLRALDVGLLANLSALAELDISNNKISTLEEG 109 

:| I |: I ::| : I : !|: : :| II:: I : :|:|:|: I 
Qy 16 TGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPG 75 

Db 110 IFANLFNLSEINLSGNPFECDCGL-AWLPQWAEEQQVRWQPEAATCAGPGSLAGQPLLG 168 

I: I :!l III :IM:|:| I I I I : hi : : I I I |: I 
Qy 76 AFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLR-KR-RIV-SGNPRCQKPFFLKEIPIQG 132 

Db 169 I 169 

Qy • 133 V 133 
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RESULT 12 

ID ALSJOUSE STANDARD; PRT; " 603 AA, 

AC P70389; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID LABILE CHAIN 

DE PRECURSOR (ALS). 

GN IGFALS OR ALS OR ALBS. 

OS MUS MUSCULUS (MOUSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI ; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=129/SV; 

RX MEDLINE; 96413591. 

RA BOISCLAIR Y.R., SETO D., HSIEH S., HURST K.R., OOI G.T.; 

RT "Organization and chromosomal localization of the gene encoding the 

# mouse acid labile subunit of the insulin -like growth factor binding 
complex."; 
PROC. NATL. ACAD. SCI. U.S.A. 93:10028-10033(1996). 

CC ■!- FUNCTION: MAY HAVE AN IMPORTANT ROLE IN REGULATING THE ACCESS OF 

CC CIRCULATING IGFS TO THE TISSUES, 

CC -!• SUBUNIT: FORMS A TERNARY COMPLEX OF ABOUT 140 TO 150 KD WITH IGF- I 

CC OR IGF-II AND IGFBP-3 (BY SIMILARITY). 

CC -!- SUBCELLULAR LOCATION; EXTRACELLULAR, 

CC •!• SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 20. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch). 

CC 

DR EMBL; U66900; G1621613; -. 

DR MGD; MGI: 107973; IGFALS. 

DR PFAM; PF00560; LRR; 10, 

KW GLYCOPROTEIN; LEUCINE- REPEAT; REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


23 


BY SIMILARITY. 


FT 


CHAIN 


24 


603 


INSULIN-LIKE GROWTH FACTOR BINDING 


FT 








PROTEIN, ACID LABILE CHAIN. 


FT 


DOMAIN 


79 


535 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


79 


89 


LRR 1. 




REPEAT 


90 


113 


LRR 2. 


1 


REPEAT 


114 


137 


LRR 3, 




REPEAT 


138 


161 


LRR 4. 




REPEAT 


162 


185 


LRR 5. 


FT 


REPEAT 


186 


209 


LRR 6. 


FT 


REPEAT 


210 


233 


LRR 7. 


FT 


REPEAT 


234 


257 


LRR 8. 


FT 


REPEAT 


258 


281 


LRR 9. 


FT 


REPEAT 


282 


305 


LRR 10. 


FT 


REPEAT 


306 


329 


LRR 11. 


FT 


REPEAT 


330 


353 


LRR 12. 


FT 


REPEAT 


354 


377 


LRR 13. 


FT 


REPEAT 


378 


401 


LRR 14, 


FT 


REPEAT 


402 


425 


LRR 15. 


FT 


REPEAT 


426 


449 


LRR 16. 


FT 


REPEAT 


450 


473 


LRR 17. 


FT 


REPEAT 


474 


497 


LRR 18. 


FT 


REPEAT 


498 


521 


LRR 19. 


FT 


REPEAT 


522 


535 


LRR 20. 


FT 


CARBOHYD 


64 


64 


POTENTIAL. 


FT 


CARBOHYD 


85 


85 


POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL. 


FT 


CARBOHYD 


368 


368 


POTENTIAL. 


FT 


CARBOHYD 


515 


515 


POTENTIAL. 


FT 


CARBOHYD 


578 


578 


POTENTIAL. 



FT CARBOHYD 586 586 POTENTIAL. 

SQ SEQUENCE 603 AA; 66959 MW; 11ADB606 CRC32; 

Query Match 16.7%; Score 166; DB 1; Length 603; 

Best Local Similarity 32.1%; Pred, No, 1.25e-13; 

Matches 34; Conservative 27; Mismatches 43; .Indels 2; Gaps 2; 

Db 283 EDTFPGLLGLHVLRLAHNAITSLRPRTFKD-LHFLEELQLGHNRIRQLGEKTFEGLGQLE 341 

1:11 ::: I I: I : ::: I |: I | | | | | ::: || ||: : 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 342 VLTLNDNQIHEVKVGAFFGLFNVAVMNLSGNCLRSLPEHVFQGLGR 387 

:|:| !h! : III I ::: :M :| : h III: 
Qy 61 LLSLYDNRITT IT PGAFTTLVSLST INLLSNPFNCNC - HLGAGLG K 105 



RESULT 13 

ID GPCR LYMST STANDARD; PRT; 1115 AA, 

AC P46023; 

DT 01-NOV-1995 (REL, 32, CREATED) 

DT 01-NOV-1995 (REL, 32, LAST SEQUENCE UPDATE) 

DT 01-OCM996 (REL, 34, LAST ANNOTATION UPDATE) 

DE G-PROTEIN COUPLED RECEPTOR GRL101 PRECURSOR. 

OS LYMNAEA STAGNALIS (GREAT POND SNAIL) . 

OC EUKARYOTA; METAZOA; MOLLUSCA; GASTROPODA; PULMONATA; BASOMMATOPHORA; 

OC LYMNAEIDAE; LYMNAEA, 

RN [1] 

RP SEQUENCE FROM N,A, 

RC TISSUE-CNS; 

RX MEDLINE; 94255418. 

RA TENSEN CP., VAN KESTEREN E.R., PLANTA R,J., COX K.J.A,, BURKE J.F., 

RA VAN HEERIKHUIZEN H. , VREUGDENHIL E. ; 

RT "A G protein-coupled receptor with low density lipoprotein-binding 

RT motifs suggests a role for lipoproteins in G-linked signal 

RT transduction."; 

RL PROC. NATL. ACAD. SCI, U.S.A. 91:4816-4820(1994). 

CC -!- FUNCTION: MIGHT DIRECTLY TRANSDUCE SIGNALS CARRIED BY LARGE 

CC EXTRACELLULAR ( LIPO ) PROTEIN{ COMPLEXE ) S INTO NEURONAL EVENTS, 

CC -!- SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN. 

CC -!- TISSUE SPECIFICITY: PREDOMINANTLY EXPRESSED IN A SMALL NUMBER OF 

CC NEURONS WITHIN THE CENTRAL NERVOUS SYSTEM AND TO A LESSER EXTENT 

CC IN THE HEART. 

CC -!- SIMILARITY: BELONGS TO FAMILY 1 OF G- PROTEIN COUPLED RECEPTORS. 

CC -!- SIMILARITY: CONTAINS 12 LDL- RECEPTOR CLASS A DOMAINS. 

CC -I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 6. 

CC - - 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://ww.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; Z23104; G438129; -. 

DR PIR; S40241; S40241. 

DR GCRDB; GCRJ860; -. 

DR PROSITE; PS00237; G.PROTEINJECEPTOR; FALSE NEG. 

DR PROSITE; PS01209; LDLRA 1; 6. 

DR PROSITE; PS50068; LDLRAJ; 11. 

DR PFAM; PF00001; 7tm_l; 1. 

DR PFAM; PF00057; ldl_recept_a; 11. 

DR PFAM; PF00560; LRR; 3. 

DR' HSSP; P0U30; 1AJJ. 

KW G-PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN; REPEAT; 



KW 


SIGNAL. 








FT 


SIGNAL 


1 


24 


POTENTIAL. 


FT 


CHAIN 


25 


1115 


G-PROTEIN COUPLED RECEPTOR GRL101, 


FT 


DOMAIN 


25 


767 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


768 


788 


1 (POTENTIAL). 


FT 


DOMAIN 


789 


801 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


802 


822 


2 (POTENTIAL). 
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FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 
FT . DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT DOMAIN 
FT 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 
DOMAIN 

A DOMAIN 

W DOMAIN 

Tt DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

M DISULFID 

■ DISULFID 

^ DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

, FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT ■ CARBOHYD 



823 857 EXTRACELLULAR (POTENTIAL). 

858 878 3 (POTENTIAL). 

879 887 CYTOPLASMIC (POTENTIAL). 

888 908 4 (POTENTIAL), 

909 941 EXTRACELLULAR (POTENTIAL), 

942 962 5 (POTENTIAL). 

963 988 CYTOPLASMIC (POTENTIAL) . 

989 1009 6 (POTENTIAL). 

1010 1017 ' EXTRACELLULAR (POTENTIAL). 

1018 1038 7 (POTENTIAL). 

1039 1115 CYTOPLASMIC (POTENTIAL). 

32 523 12 X 40 AA APPROXIMATE TANDEM REPEATS 
SIMILAR TO THE LDL-RECEPTOR CLASS A. 

36 79 LDL-RECEPTOR CLASS A 1. 

77 115 LDL-RECEPTOR CLASS A 2. 

116 155 LDL-RECEPTOR CLASS A 3. 

156 196 LDL-RECEPTOR CLASS A 4. 

195 232 LDL-RECEPTOR CLASS A 5. 

231 269 LDL-RECEPTOR CLASS A 6. 

272 318 LDL-RECEPTOR CLASS A 7. 

320 363 LDL-RECEPTOR CLASS A 8. 

365 403 LDL-RECEPTOR CLASS A 9. 

404 442 LDL-RECEPTOR CLASS A 10, 

444 485 LDL-RECEPTOR CLASS A 11. 

486 525 LDL-RECEPTOR CLASS A 12. 

588 731 LEUCINE-RICH REPEATS, 

588 611 LRR 1. 

612 635 LRR 2, 

636 659 LRR 3. 

660 683 LRR 4. 

584 707 LRR 5. 

708 731 LRR 6, 

38 53 BY SIMILARITY. 

46 66 BY SIMILARITY. 

60 77 BY SIMILARITY. 

79 91 BY SIMILARITY. 

86 104 BY SIMILARITY. 
98 113 BY SIMILARITY, 

118 131 ' BY SIMILARITY. 

138 153 BY SIMILARITY. 

158 170 BY SIMILARITY. 

165 183 BY SIMILARITY. 
177 194 BY SIMILARITY. 
202 220 BY SIMILARITY. 
214 230 BY SIMILARITY. 
233 245 BY SIMILARITY. 
240 258 BY SIMILARITY. 
252 267 BY SIMILARITY, 
274 291 BY SIMILARITY. 
282 304 BY SIMILARITY. 
298 316 BY SIMILARITY. 
322 339 BY SIMILARITY. 
334 352 BY SIMILARITY. 
346 361 BY SIMILARITY. 
367 379 BY SIMILARITY. 
374 392 BY SIMILARITY. 
386 401 BY SIMILARITY. 
406 418 BY SIMILARITY. 
413 431 BY SIMILARITY. 
425 440 BY SIMILARITY. 
446 458 BY SIMILARITY. 
453 474 BY SIMILARITY. 
465 483 BY SIMILARITY. 
488 500 BY SIMILARITY. 
495 513 BY SIMILARITY, 
507 523 BY SIMILARITY. 

87 87 POTENTIAL. 

166 166 POTENTIAL. 
269 269 POTENTIAL. 
318 318 ' POTENTIAL. 
482 482 POTENTIAL. 
502 502 POTENTIAL. 
571 571 POTENTIAL. 



FT CARBOHYD 
FT CARBOHYD 
FT CARBOHYD 



618 618 POTENTIAL, 

624 624 POTENTIAL. 

685 685 POTENTIAL. 

1115 AA; 125865 MW; 2AEC245A CRC32; 



Query Match 15.9%; Score 158; DB 1; Length 1115; 

Best Local Similarity 29.5%; Pred. No. 4.01e-12; 

Matches 28; Conservative 16; Mismatches 50; Indels 1; Gaps 1 

Db 648 EDTFSSMIHLTVLDLSNORLTHVYKNMFRG-LKQITVLNISRNQINSIDNGAFNNLANVR 706 

I :h: : I |: :| I hi I : I : I I : I :| |::|| 
3y 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 



707 LIDLSGNVIKDIGQKVFMGLPRLVELKTDSYRFCC 741 

h I I I I I I I :: I I I 
61 LLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNC 95 



RESULT 14 

ID CHAOJ3ROME STANDARD; PRT; 1134 AA, 
AC P12024; 

DT 01-OCT-1989 (REL, 12, CREATED) 

DT 01-OCT-1989 (REL. 12, LAST SEQUENCE UPDATE) 

DT 01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 

DE CHAOPTIN PRECURSOR (PHOTORECEPTOR CELL-SPECIFIC MEMBRANE PROTEIN). 

GN CHP OR CHT. 

OS DROSOPHILA MELANOG ASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPOD A; TRACHEATA; HEXAPODA; INSECT A; 
OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
OC DROSOPHILIDAE; DROSOPHILA. 
RN [1] 

RP SEQUENCE FROM N.A. 
RX MEDLINE; 88135762, 

RA REINKE R,, RRANTZ D.E., YEN D,, ZIPORSKY S.L.; 

RT "Chaoptin, a cell surface glycoprotein required for Drosophila 

RT photoreceptor cell morphogenesis, contains a repeat motif found in 

RT yeast and human , " ; 

RL CELL 52:291-301(1988). 

■!- FUNCTION: REQUIRED FOR DROSOPHILA PHOTORECEPTOR CELL 
MORPHOGENESIS. MEDIATES HOMOPHILIC CELLULAR ADHESION. 
-!- SUBCELLULAR LOCATION: EXTRACELLULAR SURFACE OF R-CELL PLASMA 
MEMBRANE, 

-!• DEVELOPMENTAL STAGE: EXPRESSED 24 HOURS AFTER INITIATION OF 

PHOTORECEPTOR CELL DIFFERENTIATION, PERSISTS THROUGH ADULTHOOD. 
-!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS, NUMBER IN THIS PROTEIN: 41. 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email' to license@isb-sib.ch). 

EMBL; M19017; G157098; -. 
EMBL; M19008; G157098; JOINED. 
EMBL; M19009; G157098; JOINED. 
EMBL; M19010; G157098; JOINED. 
EMBL; M19011; G157098; JOINED. 
EMBL; M19012; G157098; JOINED, 
EMBL; M19013; G157098; JOINED. 
EMBL; M19014; G157098; JOINED. 
EMBL; M19016; G157098; JOINED. 
PIR; A29944; A29944, 
FLYBASE; FBgn0000313; chp. 
PFAM; PF00560; LRR; 17, 

GLYCOPROTEIN; MEMBRANE; SIGNAL; REPEAT; LEUCINE-REPEAT; VISION, 
SIGNAL 
CHAIN 



CARBOHYD 
CARBOHYD 



1 

30 
77 
267 
305 



29 
1134 

77 
267 
305 



CHAOPTIN, 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
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FT 


CARBOHYD 


339 


339 


POTENTIAL, 


FT 


CARBOHYD 


361 


361 


POTENTIAL. 


FT 


CARBOHYD 


422 


422 


POTENTIAL, 


FT 


CARBOHYD 


680 


680 


POTENTIAL, 


FT 


CARBOHYD 


692 


692 


POTENTIAL, 


FT 


CARBOHYD 


718 


718 


POTENTIAL, 


FT 


CARBOHYD 


746 


746 


POTENTIAL. 


FT 


CARBOHYD 


936 


936 


POTENTIAL, 


FT 


CARBOHYD 


970 


970 


POTENTIAL. 


FT 


CARBOHYD 


1012 


1012 


POTENTIAL. 


FT 


CARBOHYD 


1104 


1104 


POTENTIAL. 


SQ 


SEQUENCE 


1134 AA; 130719 MW; B67A6363 CRC32; 



Query Match 15.0%; Score 149; DB 1; Length 1134; 

Best Local Similarity 29.1%; Pred, No, 1.85e-10; 

Matches 37; Conservative 36; Mismatches 47; Indels 7; Gaps 7; 

Db 119 DDAFTGLERSLWELILPQNDLVEIPSKSLRH-LQKLRHLDLGYNHITHIQHDSFRGLEDS 177 

: IN I: I: |:| : ::::| I |: I I I I : :|:| I! I 
Qy 1 EGAFNGAA-SVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGL-SS 58 

•178 LQTLILRENCISQLMSHSFSGLLILETLDLSGNNLFEIDPNVFVDGMPR-L-TRLLLTDN 235 
:: I I :| I: : : :|: I: I |::| :| I: : :: I: : I I ::: I 
Qy 59 VRLLSLYDNRITTITPGAFTTLVSLSTINLLSNP-FNCNCHL-GAGLGKWLRRRRIVSGN 116. 

Db 236 ILSEIPY 242 
: I: • 
Qy 117 PRCQKPF 123 



217 AAGAFQGLRQLDMLDLSNNSLASVPEGLWASLGQPNWDMRDGFDISGNPWICD 269 

::HI I I ::| :|:: I hll :| :| I: 
73 TPGAFTTLVSLSTINLLSNPFNCNCH-LGAGLG-KW-LRKRRIVSGNPR-CQ 120 



Search completed: Fri May 28 08:49:24 1999 
Job time : 21 sees. 



RESULT 15 

ID A2GLJUMAN STANDARD; PRT; 312 AA, 
AC P02750; 

DT 21-JUL-1986 (REL. 01, CREATED) 
DT 21-JUL-1986 (REL, 01, LAST SEQUENCE UPDATE) 
DT 01-OCM994 (REL. 30, LAST ANNOTATION UPDATE) 
DE LEUC INE -RICH ALPHA- 2 -GLYCOPROTEIN (LRG), 
OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE. 

RX MEDLINE; 85166241. 

RA TAKAHASHI N., TAKAHASHI Y. , PUTNAM F.W.; 

RT "Periodicity of leucine and tandem repetition of a 24 -amino acid 

RT segment in the primary structure of leucine-rich alpha 2-glycoprotein 

RT of human serum,"; 

RL PROC. NATL. ACAD. SCI. U.S.A. 82:1906-1910(1985). 

§-!• FUNCTION: THE FUNCTION OF THIS PLASMA PROTEIN IS NOT KNOWN, 
-!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS. 
PIR; A03211; NBHUA2. 
DR SWISS-2DPAGE; P02750; HUMAN. 
DR PFAM; PF00560; LRR; 4. 



KW PLASMA; GLYCOPROTEIN; REPEAT; LEUCINE-REPEAT. 



FT DISULFID 


8 


21 


FT DISULFID 


268 


294 


FT CARBOHYD 


2 


2 


FT CARBOHYD 


44 


44 


FT CARBOHYD 


151 


151 


FT CARBOHYD 


234 


234 


FT 'CARBOHYD 


290 


290 


FT CARBOHYD 


271 


271 POTENTIAL, 


SQ SEQUENCE 


312 AA; 


34346 MW; 48C3DB08 CRC32; 


Query Match 




14,5%; Score 144; DB 1; 



Best Local Similarity 32.7%; Pred. No. 1.50e-09; 
Matches 37; Conservative 27; Mismatches 43; Indels 6; Gaps 5; 

Db 158 LDLGENQLETLPPDLLRGPLQ-LERLHLEGNKLQVLGKDLLLPQPDLRYLFLNGNKLARV 216 

I I Mil: ill I I I I :| : :::| : : :| I I |::: : 
Qy 13 LMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTI 72 
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1 \ // 1 


1 _ 1 1 _l 1 _ 1 1 _l 


1 1 1 


IW/I 1 


II 1 1 1 1 1 1 1 1 1 1 


1 1 1 


1 W 1 1 

1 1 1 


1 l_l 1 1 l_ ILII II 
1 1 L 1 1 -III 


1 l_l 


1 1 1 


M 1 1 1 l\\ 1 1 


1 M 


1 1 1 


II _l 1 1 1 \ \ 1 l_ 


1 1 1 


J LI 


W 1 1 


1 1 1 



(TM) 



Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^^:ch_pp protein - protein database search, using Smith -Waterman algorithm 

Run on: Fri May 28 08:49:42 1999; MasPar time 11.79 Seconds 

638.746 Million cell updates/sec 

Tabular output not generated. 

Title: >US-09-191-647-4 

Description: (1-138) from US09191647 .pep 

Perfect Score: 995 

Sequence: 1 EGAFNGAASVQELMLTGNQL CQKPFFLKEIPIQGVGHPGI 138 

Scoring table: PAM 150 
Gap 11 

Searched: 179066 seqs, 54579741 residues 

Post-processing: Minimum Match 0* 

Listing first 45 summaries 

Database: sptrembl9 

l:sp_archea 2:sp_bacteria 3:sp_fungi 4:sp_human 
5:sp.invertebrate 6:spjiammal 7:spjihc 8:sp_organelle 
9:sp_phage 10:sp_plant ll:sp_rodent 12:sp_unclassified 
13:sp_vertebrate 14:sp_virus 

Statistics: Mean 41.993; Variance 81.660; scale 0.514 

• Pred. No. is the "number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



Score 


Match Length DB 


ID 


Description 


Pred. No. 


857 


86.1 


1523 11 


088280 


MEGF5. 


1.46e-151 


680 


68.3 


1531 11 


088279 


MEGF4. 


4.27e-114 


233 


23.4 


739 4 


075094 


MEGF5 (FRAGMENT). 


8.60e-24 


227 


22.8 


880 5 


P91643 


KEK1 PRECURSOR. 


1.10e-22 


223 


22.4 


184 4 


060803 


DJ63G5.3 (PUTATIVE LEU 


6,01e-22 


215 


21.6 


733 5 


Q24250 


TARTAN PROTEIN PRECURS 


1.75e-20 


209 


21.0 


1496 4 


Q92626 


MYELOBLAST KIAA0230 (F 


2.15e-19 


208 


20.9 


1091 11 


P70193 


MEMBRANE GLYCOPROTEIN. 


3.26e-19 


199 


20.0 


358 11 


070210 


CHONDROADHERIN. 


1.36e-17 


196 


19.7 


358 11 


055226 


CHONDROADHERIN. 


4.65e-17 


196 


19.7 


892 5 


P91644 


KEK2 PRECURSOR (FRAGME 


4.65e-17 


196 


19.7 


4293 11 


008852 


POLYCYSTIC KIDNEY DISE 


4.65e-17 


195 


19.6 


907 4 


075473 


ORPHAN G PROTEIN-COUPL 


7 . 01e-17 


193 


19.4 


359 4 


015335 


CHONDROADHERIN, 


1.59e-16 


192 


19.3 


428 4 


014498 


ISLR PRECURSOR, 


2.39e-16 


191 


19,2 


811 4 


075139 


KIAA0644 PROTEIN. 


3.6Qe-16 


190 


19.1 


331 13 


093233 


PHOSPHOLIPASE A2 INHIB 


5.41e-16 


179 


18.0 


603 11 


070211 


INSULIN-LIKE GROWTH FA 


4.61e-14 


175 


17.6 


4292 4 


Q15141 


POLYCYSTIC KIDNEY DISE 


2.28e-13 


175 


17.6 


4302 4 


Q15140 


POLYCYSTIC KIDNEY DISE 


2.28e-13 



1066 5 Q18902 

1385 5 Q26388 

1389 5 Q24591 

707 11 P97860 

677 6 Q28256 

716 11 Q61809 

3638 4 Q15142 

610 5 Q21604 

653 5 002329 

420 4 Q13641 

680 5 Q93374 

705 4 043377 

713 4 075325 

313 4 013288 

738 5 Q93373 

784 4 060603 

784 4 015454 

789 5 016781 

1355 5 016779 

658 10 048788 

661 4 Q99467 

961 5 P90920 

603 5 Q22075 

1535 5 Q23991 

718 13 073675 



CODED FOR BY C. ELEGAN 
TLR-TQLL-LIKE RECEPTOR 



LEUCINE-RICH REPEAT PR 
GLYCOPROTEIN IB. 
LEUCINE-RICH-REPEAT PR 
POLYCYSTIC KIDNEY DISE 
M88.6 PROTEIN. 
T23G11.6 PROTEIN, 
5T4 ONCOFETAL ANTIGEN 
C44H4.3 PROTEIN. 
BAC CLONE RG118D07 FRO 
GLIOMA AMPLIFIED ON CH 
P37NB. 

C44H4.2 PROTEIN. 
T OLL/I NT ERLEUK I N - 1 REC 
TOLL-LIKE RECEPTOR 2 , 
SIMILARITY TO MULTIPLE 
CODED FOR BY C. ELEGAN 
PUTATIVE RECEPTOR KINA 
LYMPHOCYTE ANTIGEN 64 
K07A12.2 PROTEIN. 
T01G9.3 PROTEIN. 
PEROXIDASE PRECURSOR. 
NEURONAL LEUCINE-RICH 



5.04e- 
5.04e- 
5.04e- 
7.49e- 
l.lle- 
l.lle- 
2.45e- 
3.64e- 
5.39e- 
1 
1 

1.18e- 
2 
3 

1.22e' 

5. 

5. 

1.226' 
1.22e- 
1.79e- 
2. 

2.62e- 
3.83e- 
3.83e- 
5.59e- 



ILT 1 
088280 



PRELIMINARY; 



PRT; 1523 AA. 



01-NOV-1998 (TREMBLREL. 08, CREATED) 
01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
MEGF5. 
MEGF5. 

RATTUS NORVEGICUS (RAT). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRA! A; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

[1] 

SEQUENCE FROM N.A. 

STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M., NAKAJIMA D., NAGASE T,, NOMURA N. , SEKI N., OHARA O.; 

"Identification of high-molecular-weight proteins with multiple 

EGF-like motifs by motif-trap screening."; 

GENOMICS 51:27-34(1998). 

EMBL; AB011531; D1033424; -. 

PROSITE; PS01185; CTCK.1; 1, 

PROSITE; PS01186; EGF 2; 7. 

PROSITE; PS01187; EGF.CA; 2, 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1523 AA; 167767 MW; 2BD845D0 CRC32; 



Query Match 
Best Local Similarity 



Matches 


Db 


574 


Qy 


1 


Db 


633 


Qy 


61 


Db 


692 


Qy 


121 



6.1%; Score 857; DB 11; Length 1523; 

9.6%; Pred. No. 1.46e-151; 

tive 9; Mismatches 3; Indels 2; 



lllhllhllllllllllllhlll III mi!l!!!!llll:||:l!!lllllll! 



Illllllllllhlllllllllllllllllllllllll: I Ihlllllllllllllll 
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ID 088279 PRELIMINARY; PRT; 1531 AA. 

AC 088279; 

DT Ol-NOV-1998 (TREMBLREL. 08, CREATED) 

DT Ol-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT Ol-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF4 , 

GN MEGF4 . 

OS RATTUS NORVEGICUS (RAT) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-SPRAGUE-DAWLEY ; TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M, , NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif -trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011530; D1033423; •. 

DR PR0SITE; PS01185; CTCKJ; 1. 

♦PROSITE; PS01186; EGF_2; 8. 
PROSITE; PS01187; EGF.CA; 2. 
GLYCOPROTEIN; EGF-Lnl DOMAIN, 
SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

Query Match 68.3%; Score 680; DB 11; Length 1531; 

Best Local Similarity 65.9%; Pred. No. 4.27e-114; 

Matches 91; Conservative 29; Mismatches 16; Indels 2; Gaps 2; 

Db 582 DGTFEGATSVSELHLTANQLESVRSGMFRG - LDGLRTLMLRNNRISC IHNDSFTGLRNVR 640 

:|:|:M:M I M : 1 1 1 1 : 1 : : I | |:|||||: |:|: ||:|:|| :|| 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 641 LLSLYDNHITTISPGAFDTLQALSTLNLLANPFNCNCQL-AWLGDWLRKRKIVTGNPRCQ 699 

!IIIIM:IHI:IIII 1 1 I I 1 1 1 1 1 ; 1 1 : 1 1 1 1 1 1 

Qy 61 LLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQ 120 

Db 700 NPDFLRQIPLQDVAFPDF 717 

:| h I : 

Qy 121 KPFFLKEIPIQGVGHPGI 138 



RESULT 3 

ID 075094 PRELIMINARY; PRT; 739 AA. 

AC 075094; 

DT Ol-NOV-1998 (TREMBLREL. 08, CREATED) 

DT Ol-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT Ol-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE_ MEGF5 (FRAGMENT). 

gk MEGF5 . 

H HOMO SAPIENS (HUMAN) . 

W EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M. , NAKAJIMA D., NAGASE T., NOMURA N, , SEKI N,, OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif-trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011538; D1033429; -. 

DR PROSITE; PS01185; CTCK.l; 1. 

DR PROSITE; PS01186; EGF.2; 7. 

DR PROSITE; PS0U87; EGF.CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 1 1 

SQ SEQUENCE 739 AA; 80364 MW; DC6BCB63 CRC32; 

Query Match 23.4*; Score 233; DB 4; Length 739; 

Best Local Similarity 36.6%; Pred. No. 8.60e-24; 

Matches 34; Conservative 24; Mismatches 34; Indels 1; Gaps 1; 



Db 12 SNMSHLSTLILSYNRLRCIPVHAFNGLRSLRVLTLHGNDISSVPEGSFNDLTSLSHLALG 71 

: :| I Ihl I : !:: :| II h|:|:| I hi I III : I 

Qy 30 GGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPGAFTTLVSLSTINLL 89 

Db 72 TNPLHCDCSLRW-LSEWVKAGYREPGIARCSSP 103 

:||::|:| I h I" :| :|| I 
Qy 90 SNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



RESULT 4 

ID P91643 PRELIMINARY; PRT; 880 AA. 

AC P91643; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03 7 LAST SEQUENCE UPDATE) 

DT Ol-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE KEK1 PRECURSOR. 

GN KEKl, 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECT A; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-OREGON R; 

RX MEDLINE; 97223506. 

RA MUSACCHIO M,, PERRIMON N.; 

RT "The Drosophila kekkon genes: novel members of both the leucine-rich 

RT repeat and immunoglobulin super families expressed in the CNS,"; 

RL DEV. BIOL, 178:63-76(1996). 

DR EMBL; U42767; G1150977; -. 

DR FLYBASE; FBgn0015399; kekl. 

DR PFAM; PF00047; ig; 1. 

DR PFAM; PF00560; LRR; 3. 

KW SIGNAL. 

FT SIGNAL 1 20 POTENTIAL. 

FT CHAIN 21 880 KEKl. 

SQ SEQUENCE 880 AA; 94542 MW; 1FF90059 CRC32; 

Query Match 22.8%; Score 227; DB 5; Length 880; 

Best Local Similarity 32.7%; Pred. No, 1.10e-22; 

Matches 37; Conservative 29; Mismatches 44; indels 3; Gaps 3; 

Db 195 PSLRELTLASNHIHKIESQAF-GNTPSLHKLDLSHCDIQTISAQAFGGLQGLTLLRLNGN 253 

:h:|| l::|:: : :"l I ::| I I I :| ::|:|| :: II I I 
Qy 8 ASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDN 67 

Db 254 KLSELLPKTIETLSRLHGIELHDNPWLCDCRLRDT-KLWLMKRNIPYPVAPVC 305 

::: : I :: II I 1:1 II |:|:| || || I : I I 
Qy 68 RITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIV-SGNPRC 119 



RESULT 5 

ID 060803 PRELIMINARY; ■ PRT; 184 AA. 

AC 060803; 

DT 01-AUG-1998 (TREMBLREL. 07, CREATED) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE DJ63G5.3 (PUTATIVE LEUCINE RICH PROTEIN) (FRAGMENT), 

GN DJ63G5.3. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA LLOYD D.; 

RL SUBMITTED (MAY-1998) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; Z94160; E1296586; -. 

FT NONJER 1 1 

FT NONJER 184 184 

SQ' SEQUENCE 184 AA; 20630 MW; 7F359903 CRC32; 

Query Match 22.4%; Score 223; DB 4; Length 184; 

Best Local Similarity 36.4%; Pred. No. 6.01e-22; 
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Matches 39; Conservative 26; Mismatches 40; Indels 2; Gaps 2 

Db 53 DGAFLGQSSLQVLQLGYNKLSNLTEGMLRG-MSRLQFLFVQHNLIEWTPTAFSECPSLI 111 

:MI I :!:! MM: :|| :| I I :: III |: :|: :|: 
Qy 1 EGAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVR 60 

Db 112 SIDLSSNRLSRLDGATFASLASLMVCELAGNPFNCECDLFGFLA-WL 157 

: I II" : ::|::l II =1 :|||||:| I : I: II 
Qy 61 LLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWL 107 



ID Q24250 PRELIMINARY; PRT; 733 AA. 

AC Q24250; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

#01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
TARTAN PROTEIN PRECURSOR. 
TRN. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN-CANTON S; 

RX MEDLINE; 94074761. 

RA CHANG Z., PRICE B.D., BOCKHEIM S,, BOEDIGHEIMER M.J., SMITH R., 

RA LAUGHON A.; 

RT "Molecular and genetic characterization of the Drosophila tartan 

RT gene."; 

RL DEV. BIOL. 160:315-332(1993). 

DR EMBL; U02078; G408375; -. 

DR FLYBASE; FBgn0010452; trn. 

DR PFAM; PF00560; LRR; 6. 

KW SIGNAL. 

FT SIGNAL 1 24 

SQ SEQUENCE 733 AA; 81319 MW; 



POTENTIAL. 
DA426BB0 CRC32; 



Query Match 21.6%; Score 215; DB 5; Length 733; 

Best Local Similarity 35.3%; Pred. No. 1.75e-20; 

Matches 41; Conservative 28; Mismatches 43; Indels 4; Gaps 4; 

Db 195 FQAMPSLAELFLGMNTLQSIQAGAFQD-LKGLTRLELKGASLRNISHDSFLGLQELRILD 253 
I : :|: II I I |::::: :|: I II I |:: : :|:|:| II :|:| 
4 FNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLS 63 



I 



254 LSDNRLDRIPSVGLSKLVRLEQLSLGQNDFEVISE-GAFMGLKQL-KRLEVNGALR 307 

I III: I : ::: II I ::l I |: II :| II II |:| I 
64 LYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLG-KWLRKFRIVSGNPR 118 



1496 AA. 



, CREATED) 
f LAST SEQUENCE UPDATE) 



RESULT 7 

ID Q92626 PRELIMINARY; 

AC Q92626; 

DT 01-FEB-1997 (TREMBLREL, 02, 

DT 01-FBB-1997 (TREMBLREL, 02, 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MYELOBLAST KIAA0230 (FRAGMENT). 

GN KIAA0230. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES ; . 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BONE MARROW; 

RX MEDLINE; 97191544. 

RA NAGASE T., SEKI H., ISHIKAWA K., OHIRA M., KAWARABAYASI Y,, OHARA 0,, 

RA TANAKA A,, KOTANI H. , MIYAJIMA N., NOMURA N.; 

RT "Prediction of the coding sequences of unidentified human genes, VI. 

RT The coding sequences of 80 new genes (KIAA0201-KIAA0280) deduced by 

RT analysis of cDNA clones from cell line KG-1 and brain."; 
RL . DNA RES. 3:321-329(1996) . 



DR EMBL; D86983; D1013908; -. 

DR PFAM; PF00047; ig; 4. 

DR PFAM; PF00093; vwc; 1. 

DR PFAM; PF00141; peroxidase; 1. 

DR PFAM; PF00560; LRR; 3. 

FT NONJER 1 1 

1496 AA; 167209 MW; 5731EE51 CRC32; 



Query Match 21.0%; Score 209; DB 4; Length 1496; 

Best Local Similarity 33.7%; Pred. No. 2.15e-19; 

Matches 33; Conservative 27; Mismatches 37; Indels 1; Gaps 



Db 



121 GAFEDLENLKYLYLYKNEIQSIDRQAFKG-LASLEQLYLHFNQIETLDPDSFQHLPKLER 179 
III: :: I I |::::: ::|:| |::| II: I I : |:| h : 
Qy 2 GAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRL 61 

Db 180 LFLHNNRITHLVPGTFNHLESMKRLRLDSNTLHCDCEI 217 

I I :llll : Ihl M: : I II ::|:| : 
Qy 62 LSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHL 99 



P70193 PRELIMINARY; PRT; 1091 AA. 
P70193; 

01-FEB-1997 (TREMBLREL, 02, CREATED) 
01-FEB-1997 (TREMBLREL. 02, LAST SEQUENCE UPDATE) 
Ql-JAN-1999 (TREMBLREL. 09, LAST ANNOTATION UPDATE) 
MEMBRANE GLYCOPROTEIN. 
MUS MUSCULUS (MOUSE) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

[1] 

SEQUENCE FROM N.A, 
MEDLINE; 96394313, 

SUZUKI Y., SATO N., TOHYAMA M. ( WANAKA A., TAKAGI T.; 

"cDNA cloning of a novel membrane glycoprotein that is expressed 

specifically in glial cells in the mouse brain LIG-1: A protein with 

leucine-rich repeats and immunoglobulin-like domains."; 

J. BIOL. CHEM. 271:22522-22527(1996). 

EMBL; D78572; D1012081; -. 

MGD; MGI: 107935; IMG. 

PFAM; PF00047; ig; 3. 

PFAM; PF00560; LRR; 7. 



SEQUENCE 1091 AA; 119283 MW; C0F262F9 CRC32; 



Query Match 20,9%; Score 208; DB 11; Length 1091; 

Best Local Similarity 33.1%; Pred. No. 3,26e-19; 

Matches 45; Conservative 30; Mismatches 57; Indels 4; Gaps 4; 

Db 350 EGAFKGLKSLRVLDLDHNEISGTIEDTSGAFTGLDNLSKLTLFGNKIKSVAKRAFSGLES 409 

1111:1 h: I I I" |: I II I I I :| I |:: :|:|| I 
Qy 1 EGAFNGAASVQELMLTGNQLE-TVHG-RGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSS 58 

Db 410 LEHLNLGENAIRSVQFDAFAKMKNLKELYISSESFLCDCQLKW-LPPWLMGRMLQAFVTA 468 

: 1:1 :| I :: II: : :| : : |::| |:|:| I II I : : : 
Qy 59 VRLLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVS-GNP 117 

Db 469 TCAHPESLKGQSIFSV 484 

I I II :| :| 
Qy 118 RCQKPFFLKEIPIQGV 133 



070210 PRELIMINARY; PRT; 358 AA. 
070210; 

01-AUG-1998 (TREMBLREL. 07, CREATED) 

01-AUG-1998 (TREMBLREL. 07, LAST SEQUENCE UPDATE) 

01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 



OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 
OC ■ SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 
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RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 98129774. 

RA SHEN Z., GANTCHEVA S., MANSSON B., HEINEGARD D., SOMMARIN Y.; 

RT "Chondroadherin expression changes in skeletal development."; 

RL BIOCHEM, J. 330:549-557(1998). 

DR EMBL; AF004953; G2947105; ■. 

SQ SEQUENCE 358 AA; 40403 MW; 47F9BF88 CRC32; 

Query Match 20.0*; Score 199; DB 11; Length 358; 

Best Local Similarity 30.1*; Pred. No. 1.36e-17; 

Matches 34; Conservative 33; Mismatches 42; Indels 4; Gaps 4; 

Db 220 VEELKLSHNPLKSIPDNAFQSFGRYLETLWLDNTNLERFSDAAFAGVTTLKHVHLENNRL 279 

I:||:|: I I :: :|:: Ml I : : |: : I :||: 

Qy 10 VQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRI 69 

Db 280 NQL-PSTFP-FDNLETLTLTNNPWKCTCQL-RGLRRWL-EAKTSRPDATCSSP 328 

: l::l —MM =11 :l hi II M : :: I I 
Qy 70 TTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 

^fJLT 

ID 055226 PRELIMINARY; PRT; 358 AA. 

AC 055226; 

DT 01-JUN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE CHONDROADHERIN. 

GN CHAD. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 98126439, 

RA LANDGREN C, BEIER D.R., FAESSLER R., HEINEGARD D,, SOMMARIN Y.; 

RT "The mouse chondroadherin gene: characterization and chromosomal 

RT localization."; 

RL GENOMICS 47:84-91(1998). 

DR EMBL; U96626; G2843110; -. 

DR MGD; MGI:1096866; CHAD, 

SQ SEQUENCE 358 AA; 40348 MW; 18AOCDB3 CRC32; 

Query Match 19.7%; Score 196; DB 11; Length 358; 

Best Local Similarity 29.2*; Pred. No. 4.65e-17; 

Matches 33; Conservative 34; Mismatches 42; Indels 4; Gaps 4; 

Db 220 VEELKLSHNPLKSIPDNAFQSFGRYLETLWLDNTNLEKFSDAAFSGVTTLKHVHLDNNRL 279 
m MM: I I :: :|:; || | : : : :|:|::::: : | :||: 
M 10 VQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRI 69 

Db 280 NQL-PSSFP-FDNLETLTLTNNPWKCTCQL-RGLRRWL-EAKASRPDATCSSP 328 

: h:l : :l h I :|| :| hi II :|| : ;;| I 
Qy 70 TTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



RESULT 11 

ID P91644 PRELIMINARY; PRT; 892 AA. 

AC P91644; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE KEK2 PRECURSOR (FRAGMENT). 

GN KEK2 . 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=0REG0N R; 

RX MEDLINE; 97223506. 



RA MUSACCHIO M,, PERRIMON N. ; 

RT "The Drosophila kekkon genes: novel members of both the leucine-rich 

RT repeat and immunoglobulin superfamilies expressed in the CNS . " ; 

RL DEV. BIOL. 178:63-76(1996). 

DR EMBL; U42768; G1150979; -. 

DR FLYBASE; FBgn0015400; kek2. 

DR PFAM; PF00047; ig; 1. 

DR PFAM; PF00560; LRR; 3. 



KW 


SIGNAL, 






FT 


NON.TER 


1 


1 


FT 


SIGNAL 


<1 


17 POTENTIAL. 


FT 


CHAIN 


18 


892 KEK2. 


SQ 


SEQUENCE 


892 AA; 


97294 MW; 93DCE450 CRC32; 



Query Match 19.7*; Score 196; DB 5; Length 892; 

Best Local Similarity 31,84; Pred. No. 4.65e-17; 

Matches 42; Conservative 27; Mismatches 59; Indels 4; Gaps 4 

Db 119 TFQDYSSLMRLSLSGNPIRELKTSAFRH-LSFLTTLELSNCQVERIENEAFVGMDNLEWL 177 

:| M I hll : : :|| II I II I : : : Ml h :: I 
Qy 3 AFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLL 62 

Db' 178 RLDGNRIGFI-QGTHILPKSLHGISLHSNRWNCDCRL-LDIHFWLVNYNTPLAEEPKCME 235 

I III I h II hi II MM : II : :: Ml 
Qy 63 SLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRK-RRIVSGNPRCQK 121 

Db 236 PARLKGQVIKSL 247 

III M: 
Qy 122 PFFLKEIPIQGV 133 



RESULT 12 

ID 008852 PRELIMINARY; PRT; 4293 AA. 

AC 008852; 

DT 01-JUL-1997 (TREMBLREL. 04, CREATED) 

DT 01-JUL-1997 (TREMBLREL. 04, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE POLYCYSTIC KIDNEY DISEASE 1 PROTEIN. 

GN PKD1. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97262094. 

RA LOHNING C, NOWICKA O., FRISCHAUF A.M.; 

RT "The mouse homolog of PKD1: sequence analysis and alternative 

RT splicing,"; 

RL MAMM. GENOME 8:307-311(1997), 

DR EMBL; U70209; G2138183; -. 

DR PFAM; PF00560; LRR; 1. 

DR PFAM; PF00801; PKD; 16. 

SQ SEQUENCE 4293 AA; 466545 MW; 10E37A8A CRC32; 

Query Match 19.7%; Score 196; DB 11; Length 4293; 

Best Local Similarity 34,2%; Pred. No. 4.65e-17; 

Matches 40; Conservative 28; Mismatches 43; Indels 6; Gaps 5; 

Db 56 LQTL-GPSLRIP-ADATALDLSHNLLQTLDIGLLVNLSALVELDLSNNRISTLEEGVFAN 113 

hh I ::| : :| I II: : : lh: I I :|||:|: I h 
Qy 20 LETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPGAFTT 79 

■ Db 114 LFNLSEINLSGNPFECNCGL-AWLPRWAKEHQVHWQSEATTCRGPIPLAGQPLLSI 169 

I :ll III illhlll I I MM : ::| | |: |; | |: :: 
Qy 80 LVSLSTINLLSNPFNCNCHLGAGLGKWLRKR-RIV-SGNPRCQKPFFLKEIPIQGV 133 



PRELIMINARY; PRT; 907 AA. 



RESULT 13 
ID 075473 
AC 075473; 
DT 01-NOV-1998 (TREMBLREL, 08, CREATED) 
DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 
DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
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DE ORPHAN G PROTEIN-COUPLED RECEPTOR HG38. 

OS HOMO SAPIENS (HUMAN) . 

OC EDKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 98308104. 

RA MCDONALD T,, WANG R., BAILEY W., XIE G., CHEN P., CASKEY C.T., 

RA LIDO.; 

RT "Identification and cloning of an orphan G protein -coupled receptor 

RT of the glycoprotein hormone receptor subfamily. "; 

RL BIOCHEM, BIOPHYS. RES. COMMUN. 247:266-270(1998). 

DR EMBL; AF062006; G3366802; •. 

SO SEQUENCE 907 AA; 99997 MW; B9147406 CRC32; 



^Be< 

t 



Query Match 19.64; Score 195; DB 4; Length 907; 

lest Local Similarity 40.2%; Pred. No. 7 .01e-17; 
\tches 35; Conservative 16; Mismatches 35; Indels 1; Gaps 1; 



108 GAFTGLYSLKVLMLQNNQLRHVPTEALQN-LRSLQSLRLDANHISYVPPSCFSGLHSLRH 166 
III I I: III III I ::: I :| :| I :| I: I: |:|| |:| 
Qy 2 GAFNGAASVQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRL 61 

Db 167 LWLDDNALTEIPVQAFRSLSALQAMTL 193 

! Ill :| I II :| :| :: I 
Qy 62 LSLYDNRITTITPGAFTTLVSLSTINL 88 



RESULT 14 

ID 015335 PRELIMINARY; PRT; 359 AA. 
AC 015335; 

DT 01-JAN-1998 (TREMBLREL, 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE CHONDROADHERIN. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 
OC CATARRHINI; HOMINIDAE; HOMO, 
RN [1] 

RP SEQUENCE FROM N.A. 
RX MEDLINE; 98008928. 

RA GROVER J., CHEN XJ., KORENBERG J.R., ROUGHLEY P.J.; 

"The structure and chromosome location of the human chondroadherin 
gene (CHAD)."; 
GENOMICS 45:379-385(1997). 
[2] 

SEQUENCE FROM N.A. 
GROVER J., ROUGHLEY P.J.; 
RL SUBMITTED (APR-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 
RN [3] 

RP SEQUENCE FROM N.A. 

RA GROVER J., ROUGHLEY P.J.; 

RL SUBMITTED (APR-1998) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; U96769; G2104968; -. 

DR EMBL; U96767; G2 104 968; JOINED. 

DR EMBL; U96768; G2104968; JOINED. 

DR PFAM; PF00560; LRR; 5. 

SQ SEQUENCE 359 AA; 40487 MW; 22CBC94E CRC32; 

Query Match 19.4%; Score 193; DB 4; Length 359; 

Best Local Similarity 30.1%; Pred, No, 1.59e-16; 

Matches 34; Conservative 31; Mismatches 44; Indels 4; Gaps 4; 

Db 221 VEELKLSHNPLKSIPDNAFQSFGRYLETLWLDNTNLEKFSDGAFLGVTTLKPHLENNRL 280 

1:11:1: I I :: :|:: Mil:: |: :| |::;:: : | :||; 
Qy 10 VQELMLTGNQLETVHGRGFRGGLSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRI 69 

Db 281 NQL - PSNFP - FDSLETLALT NNPWKCTCQL" RGLRRWL ■ EAKASRPDATCASP 329 

: I: I : II I: I :|| :| |:| II :|| : :: I I 
Qy 70 TTITPGAFTTLVSLSTINLLSNPFNCNCHLGAGLGKWLRKRRIVSGNPRCQKP 122 



RT 
RT 

I 



014498 PRELIMINARY; PRT; 428 AA. 
014498; 

01-JAN-1998 (TREMBLREL. 05, CREATED) 
Ol-JAN-1998 (TREMBLREL, 05, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
ISLR PRECURSOR. 
HOMO SAPIENS (HUMAN) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

CATARRHINI; HOMINIDAE; HOMO. 

[1] 

SEQUENCE FROM N.A, 
TISSUE-RETINA; 
MEDLINE; 97468140. 

NAGASAWA A,, KUBOTA R,, IMAMURA Y,, NAGAMINE K., WANG Y., ASAKAWA S,, 
KUDOH J., MINOSHIMA S., MASHIMA Y,, OGtJCHI Y., SHIMIZU N.; 
"Cloning of the cDNA for a new member of the immunoglobulin 
superfamily (ISLR) containing leucine-rich repeat (LRR). n ; 
GENOMICS 44:273-279(1997). 
EMBL; AB003184; D1023718; -. 
PFAM; PF00047; ig; 1. 
PFAM; PF00560; LRR; 3. 
SIGNAL. 

SIGNAL 1 18 POTENTIAL. 

428 AA; 45997 MW; C83DFE0B CRC32; 



Query Match 19.3%; Score 192; DB 4; Length 428; 

Best Local Similarity 35,3%; Pred. No. 2.39e-16; 

Matches 36; Conservative 20; Mismatches 42; indels 4; Gaps 4; 

Db 121 LSALQLLKMDSNELTFIPRDAFRSLRALRSLQLNHNRLHTLAEGTFTPLTALSHLQINEN 180 

IN I- II : :: N :| ::| I I II: |:: Nl I :|| : : I 
Qy 32 LSGLKTLMLRSNLIGCVSNDTFAGLSSVRLLSLYDNRITTITPGAFTTLVSLSTINLLSN 91 

Db 181 PFDCTCGI - VWLKTWALT TAVS IPEQDNI ACTSPHVLKGT PL 221 

IN I : I I I I I I I III: 
Qy 92 PFNCNCHLGAGLGKW-LRKR-RIVS-GNPRCQKPFFLKEIPI 130 



Search completed: Fri May 28 08:50:23 1999 
Job time : 41 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K, 
Distribution rights by Oxford Molecular Ltd 

^rch_PP protein ■ protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 08:52:24 1999; MasPar time 8,51 Seconds 

399.730 Million cell updates/sec 

Tabular output not generated. 

Title: XJS-09-191-647-5 

Description: (1-160) from DS09191647 . pep 

Perfect Score: 1212 

Sequence: 1 WPRCECMPGYAGDNCSENQD NGGNDHIAVXLYXGHVRFSY 160 

Scoring table: PAM 150 
Gap 11 

Searched: 170751 seqs, 21266608 residues 

Post-processing: Minimum Match Oi 

Listing first 45 summaries 

Database: a-geneseq35 

1: parti 2:part2 3:part3 4:part4 5:part5 6:part6 7:part7 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13;partl3 
14:partl4 15:partl5 16:partl6 17:partl7 18:partl8 
19:partl9 20:part20 21:part21 22:part22 23:part23 
24:part24 25:part25 26:part26 27:part27 28:part28 
29:part29 30:part30 31:part31 32;part32 33:part33 
34;part34 35:part35 36;part36 37:part37 38:part38 
A 39:part39 

Rustics: Mean 30,075; Variance 148.234; scale 0.203 

Pred, No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 
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Human serrate 1. 
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19 
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Human Serrate -1 (HJl) 
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271 


22.4 


685 37 
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Nucleotide sequence o 
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Mouse receptor ME2, 
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Human serrate 2 prote 
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22.1 


1212 29 
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Human serrate 2, 
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Human Serrate -2 (HJ2) 
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24 
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Human partial mature 
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Human membrane protei 
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Sequence of a serrate 


3.21e-10 


27 


236 


19.5 


157 21 


W11730 


H-Delta-1 polypeptide 


7.98e-10 


28 


229 


18.9 


2409 3 


R12609 


Versican. 


2.84e-09 


29 


227 


18.7 


833 6 


R28960 


Delta Dll. 


4.08e-09 


30 


225 


18.6 


1257 9 


R46627 


Neurocan core protein 


5.86e-09 


31 


220 


18.2 


1572 24 


W27160 


Mouse receptor ME2 re 


1.45e-08 


32 


218 


18.0 


385 10 


R56167 


Neuroendocrine tumor 


2.07e-08 


33 


207 


17.1 


473 17 


R86869 


Adhesive protein, 


1.49e-07 


34 


206 


17.0 


77 6 


R28962 


ELR-11 and -12. 


1.79e-07 


35 • 


204 


16.8 


4544 11 


R60517 


Human alpha -2 -MR. 


2.55e-07 


36 


204 


16.8 


4544 9 


R47861 


Alpha 2-Macroglobulin 


2.55e-07 


37 
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16.6 


383 10 
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Neuroendocrine tumor 
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38 
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14,9 
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R05222 


Antigen GX5401FL enco 


1.79e-05 


39 
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228 30 


W46967 


Amino acid sequence o 
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40 
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Murine Del-1 truncate 
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41 
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14,1 
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42 
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13.7 
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Human developmental^ 
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43 
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13.3 
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W37500 


Human nel -related pro 


4.86e-04 


44 
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R85442 


Bovine brevican core 


8.14e-04 


45 
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13.0 


2476 39 


W67738 
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9.67e-04 



RESULT 
ID 
AC 



1 



W46966 standard; Protein; 1534 AA. 
W46966; 

DT 06-JUL-1998 (first entry) 

DE Amino acid sequence of a human slit-like polypeptide. 

KW Slit-like protein; human; diagnosis; treatment; brain -specific disease; 

KW cancer; antibody. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

ft Peptide 1..26 

FT /note- "signal peptide" 

FT Protein 27.. 1534 

FT /note- "mature protein" 

PN J10087699-A. 

PD 07-APR-1998. 

PF 15-JCL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (ASAH ) ASAHI KASEI KOGYO KK, 

DR WPI; 98-267127/24. 

DR N-PSDB; V16978, 

PT Human Slit-like protein - useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 31-35; 45pp; Japanese. 

CC The present sequence represents a novel human slit-like protein (the 

CC mature protein is claimed in Claim 1), The slit-like polypeptide is 

CC useful for diagnosis and treatment of brain-specific diseases and 

CC cancers. Antibodies directed against the protein, or its fragments 

CC can also be used for diagnosing cancer. 

SQ Sequence 1534 AA; 

Query Match 95.5*; Score 1158; DB 30; Length 1534; 

Best Local Similarity 95.6%; Pred. No. U0e-87; 

Matches 152; Conservative 0; Mismatches 7; Indels 0; Gaps 0; 

Db 1067 prcecmpgyagdncsenqddcrdhrcqngaqcmdevnsysclcaegysgqlceipphlpa 1126 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Qy 2 PRCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA 61 

Db 1127 pkspcegtecqngancvdqgnrpvcqclpgfggpecekllsvnfvdrdtylqftdlqnvp 1186 
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
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Qy 62 PKSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWX 121 

Db 1187 ran i tlqvstaedngi 1 lyngdndhiavelyqghvrvsy 1225 

I HUM 1 1 1 ! 1 1 1 1 1 1 1 1 llllll II llll II 
Qy 122 RXNI T LQVFT AED NG I LLY NGGNDH I AVXLYXGHVRFSY 160 



RESULT 2 

ID R25079 standard; Protein; 1480 AA, 

AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development, 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss, 

OS Drosophila melanogaster, 

FH Key Location/Qualifiers 

FT peptide 1..36 

FT /label- signal 

FT domain 73.. 294 

§ /label- Flank_LRR_Flank_l 
/note- "mediates adhesive events" 
domain 295.. 518 

FT /label- Flank-LRR-Flank_2 

FT /note- "mediates adhesive events" 

FT domain 519.. 714 

FT /label- Flank_LRR_Flank_3 

FT /note- "mediates adhesive events" 

FT domain 715.. 910 

FT /label- Flank_LRR_Flank_4 

FT /note- "mediates adhesive events" 

FT region 911.. 1150 

FT /label- Tandem_EGF_like_repeats 

FT /note- "involved in protein -protein interactions" 

FT region 1353.. 1393 

FT /label- 7thJGFJike_repeat 

FT /note- "involved in receptor -ligand interactions" 

FT region 1394.. 1404 

FT /label- alternative_splice_segment 

FT /note- " developmental ly regulated" 

FT region 14 05., 14 80 

FT /label- C-terminal_region 

PN WO9210518-A. 

PD 25-JUN-1992. 

PF 27-NOV-1991; U09055. 

PR 07-DEC-1990; US-624135. 

PA (UYYA ) DNIV YALE. 

Pi Artavanis-Tsakonas S, Rothberg JM; 

♦WPI; 92-234590/28. 
N-PSDB; Q25811. 
SLIT protein and sequence elements for treating 
neuro-degenerative disease - useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 1; Page 84-89; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways , The process 

CC is dependent on the level of SLIT protein expression, It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham, The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding. SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes -caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 



CC claimed as are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 

Query Match 40.31; Score 489; DB 5; Length 1480; 

Best Local Similarity 40.5%; Pred. No. 1.58e-30; 

Matches 66; Conservative 42; Mismatches 48; Indels 7; Gaps 5; 

Db 1050 cdcqagfhgtnctdniddcqnhmcqnggtcvdgindyqcrcpddytgkyceghnmismmy 1109 

1:1 :|: I ll::l lll::l lllh hi :l I I I:: hi II :: 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA-- 61 

Db 1110 pqtspcqnheckhgvcfqpnaqgsdylcrchpgytgkwceyltsisfvhnnsfveleplr 1169 

I III: II :| II: :|:| II: I II I l::|| :::::: |: 
Qy 62 PK-SPCEGTECQNGA--NCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQ 118 

Db 1170 trpeanvti-vfssaeqngilmydgqdahlavelfngrirvsy 1211 

: 1:1: lh Ihlllhhl : IMI I: l::l II 
Qy 119 NWXRXNITLQVFT-AEDNGILLYNGGNDHIAVXLYXGHVRFSY 160 



RESULT 3 

ID W18348 standard; protein; 520 AA. 

AC W18348; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression, 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (AS AH ) ASAHI KASEI KOGYO KK, 

PI Itoh A, Sakano S; 

DR. WPI; 97-298110/27, 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 • suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 3; Page 59-61; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression, 

SQ Sequence 520 AA; 

Query Match 26.2%; Score 318; DB 25; Length 520; 

Best Local Similarity 43,4%; Pred, No, 2 . lle-16 ; 

Matches 43; Conservative 24; Mismatches 26; Indels 6; Gaps 5; 

Db 408 crcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcppgytgrncs-ap-vs-r 463 

I I :h:l :l :| III I lh I I II :|| h 1 1 : 1 : I :| : : : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 464 --cehapchngatcherghryvcecargyggpncqfllp 500 

II : 1:111 I ::hl 1 1 : 1 1 : 1 1 1 : 1 : lh 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESULT 4 

ID W18349 standard; protein; 702 AA, 

AC W18349; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens, 
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PN W09719172-A1. 

PD 29-MAY-1997 . 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 ■ suppress 

pt proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 4; Page 61-64; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

« cells, such as leukaemia and malignant tumours, and improvement of 
blood formation, e.g. after immunosuppression. 
Sequence 702 AA; 

Query Match 26.2*; Score 318; DB 25; Length 702; 

Best Local Similarity 43.41; Pred, No. 2.11e-16; 

Matches 43; Conservative 24; Mismatches 26; Indels 6; Gaps 5; 

Db 408 crcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcppgytgrncs-ap-v-s-r 463 

I I :|::l =1 =1 III I II; I I II :|| |: ||:|: I :| : : : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 464 - - c ehapchnga tcherghry vcecargyggpn cq f 11 p 500 

II : hill I ::hl IN 1 : 1 1 1 : 1 : II: 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLIS 102 



RESULT 5 

ID W18353 standard; protein; 723 AA. 

■ AC W18353; 

DT ll-PEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

kw immunosuppression, 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1. .21 

JS /label= Signal 

A Protein 22.. 723 

/label= Differentiation_suppression_protein 

TN W09719172-A1. 

PD 29-MAY-1997. • 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

DR N-PSDB; T70174. 

PT Peptide (s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 15; Page 77-82; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 723 AA; 

Query Match 26.21; Score 318; DB 25; Length 723; 

Best Local Similarity 43.4%; Pred. No. 2.11e-16; 

Matches 43; Conservative 24; Mismatches 26; Indels 6; Gaps 5; 

Db.' 429 crcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcppgytgrncs-ap-v-s-r 484 



I I :i::i :i :i III I 1 1 : I I 1 1 : 1 1 1 : 1 1 : 1 : I :| : : : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 485 --cehapchngatcherghryvcecargyggpncqfllp 521 

II : hill I ::|:| Ihl I : I ! I : I : lh 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 

RESULT 6 

ID W11725 standard; Protein; 660 AA. 

AC W11725; 

DT 28-APR-1997 (first entry) 

DE H-Delta-1 polypeptide (reading frame 1 product). 

KW H-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy. 



OS Homo sapiens, 

FH Key Location/Qualifiers 

FT region 219.. 221 

FT /note- "Delta-1 homologous region" 

FT region 245.. 246 

FT /note- "Delta-1 homologous region" 

FT region 259.. 428 

FT /note- "Delta-1 homologous region" 

FT region 430.. 434 

FT /note- "Delta-1 homologous region" 

FT region 594.. 597 

FT /note- "Delta-1 homologous region" 

FT region 605.. 608 

FT /note- "Delta-1 homologous region" 

FT region 615.. 617 

FT /note- "Delta-1 homologous region" 

FT misc.difference 32 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 40 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc_difference 41 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc_difference 87 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

ft misc.difference 138 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT miscjlifference 139 

ft /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 145 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 162 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 187 

FT /note- "undetermined amino acid residue" 

FT misc.difference 203 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

ft misc.difference 230 

FT /note- "undetermined amino acid residue" 

FT misc.difference 249 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 429 

FT /note- "undetermined amino acid residue" 

FT misc.difference 447 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 498 
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FT 


/note* 


"undetermined amino acid residue" 


FT 


miscjifference 513 




FT 


/note= 


"residue corresponds to stop codon in 


FT 


H-Delta 


-1 contig" 


FT 


miscjifference 541 


FT 


/note* 


"undetermined amino acid residue" 


FT 


miscjifference 552 




FT 


/note- 


undetermined amino acid residue" 


FT 


miscjifference 556 




FT 


/note- 


"residue corresponds to stop codon in 


FT 


H-Delta 


•1 contig" 


FT 


miscjifference 564 




FT 


/note- 


"residue corresponds to stop codon in 
-1 contig" 


FT 


H-Delta 


FT 


miscjifference 576 




FT 


/note- 


"residue corresponds to stop codon in 


FT 


H-Delta 


-1 contig" 


FT 


miscjifference 580 




FT 


/note- 


"undetermined amino acid residue" 


FT 


miscjifference 619 




BT 


/note- 


"undetermined amino acid residue" 


m 


miscjifference 621 




W 


/note- 


"undetermined amino acid residue" 


T 


miscjifference 626 




FT 


/note- 


"undetermined amino acid residue" 


FT 


miscjifference 630 




FT 


/note- 


"undetermined amino acid residue" 


FT 


miscjifference 634 




FT 


/note- 


"residue corresponds to stop codon in 


FT 


H-Delta 


•1 contig" 


FT 


miscjifference 639 




FT 


/note- 


"undetermined amino acid residue" 


FT 


miscjifference 642 




FT 


/note- 


"undetermined amino acid residue" 


FT 


miscjifference 643 




FT 


/note- 


"residue corresponds to stop codon in 
-1 contig" 


FT 


H-Delta 


FT 


miscjifference 644 




FT 


/note- 


"undetermined amino acid residue" 


FT 


miscjifference 647 




FT 


/note- 


residue corresponds to stop codon in 


FT 


H-Delta 


-1 contig" 


FT 


miscjifference 648 




FT 


/note- 


"undetermined amino acid residue" 


FT 


miscjifference 651 




FT 


/note- 


"undetermined amino acid residue" 


FT 


miscjifference 652 




FT 


/note- 


"undetermined amino acid residue" 



Gray GE, Henrique D, Ish-Horowicz D; 



PN WO9701571-A1. 

A 16-JAN-1997. 

V 28-JUN-1996; 011178. 

W 28-JUN-1995; OS-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY . 

PA (UYYA ) ONIV YALE, 

PI Artavanis-Tsakonas S, 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T59454. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 12B1-6; 135pp; English. 

CC Polypeptide sequences (W11725-27) were determined for all 3 

CC reading frames of a human H-delta-1 contig sequence (T59454) obtd. 

CC from a foetal brain library. Errors in the contig sequence meant 

CC that no single reading frame gave the correct sequence for the 

CC H-Delta-1 protein. The 3 polypeptide sequences were therefore 

CC compared to chick and mouse Delta-1 sequences (see also W11719-20) 

CC and regions of homology (see also W11728-38) identified. H-Delta-1 

CC is the human homologue of Drosophila Delta, a protein that binds 

CC to Notch protein. H-Delta-1 polypeptides can be used to treat 

CC disorders of cell fate or differentiation, such as cancer, and 

CC nervous system disorders, or to promote tissue regeneration and 



CC repair, 

SQ Sequence 660 AA; 



Query Match 25,2%; Score 306; DB 21; Length 660; 

Best Local Similarity 42.4*; Pred. No. 1.99e-15; 

Matches 42; Conservative 24; Mismatches 27; Indels 6; Gaps 3; 

Db 341 crcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcppgytgrnc--sa--pasr 396 

I I :h:l :| :| III MM II :|| |: ||:|: | :: ||:: 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 397 -cehapchngatcherghryxcecarsyggpncxfllp 433 

II : hill I ::hl |:| ::|||:| l|: 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESULT . 7 

ID W11719 standard; Protein; 727 AA. 

AC W11719; 

DT 28-APR-1997 (first entry) ■ 

DE C-Delta-1 polypeptide. 

KW C-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung, cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy. 

OS Gallus sp. 

FH Key Location/Qualifiers 

FT domain 184;. 228 

FT /label- DSL 

FT domain 229,. 261 

FT /label- EGF1 

FT domain 262,. 292 

FT /label- EGF2 

FT domain 293.. 3 32 

FT /label- EGF3 

FT domain 333.. 370 

FT /label- EGF4 

FT domain 371.. 409 

FT /label- EGF5 

FT domain 410.. 447 

FT /label- EGF6 

FT domain 448.. 4 85 

FT /label- EGF7 

FT domain 486.. 523 

FT /label- EGF8 

FT domain 524.. 534 

FT /label- EGF9 

FT domain 555.. 579 

FT /label- TM 

FT /note- "transmembrane domain" 

PN WO9701571-A1. 

PD 16-JAN-1997 . 

PF 28-J0N-1996; 011178. 

PR 28-J0N-1995; US-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY, 

PA (OYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58897. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 2; 135pp; English, 

CC C-delta-1 polypeptide (W11719) is the chick homologue of Drosophila 

-CC Delta, a protein that binds to Notch protein, Expression of 

CC C-Delta-1 correlates with onset of neurogenesis. The C-delta-1 

CC amino acid sequence was deduced from a cDNA clone (T58897) obtd, 

CC from chick stage 4-6 embryos. An alternatively spliced variant 

CC (W00876) was also isolated, and mouse (W11720) and human (W11721- 

CC 38) Delta-1 polypeptides have been identified. Delta-1 proteins 

CC can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon '• 
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CC cancer, melanoma or seminoma, and nervous system disorders or to 
CC promote tissue regeneration and repair. 
SQ Sequence 727 AA; 

Query Match 25.1*; Score 304; DB 21; Length 727; 

Best Local Similarity 43.4%; Pred. No. 2.89e-15; 

Matches 43; Conservative 21; Mismatches 29; Indels 6; Gaps ■ 4; 

Db 436 cqcqagftgrhcddnvddcasfpcvnggtcqdgvndysctcppgyngkncstp--v-s-r 491 

hi :l :| III I II: I I 1 1 III |: Ihl II : : : 

Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 492 -cehnpchngatchersnryvcecargygglncqfllp 52B 

II hill I :::M Ihl hi :|: lh 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 

AllLI 8 

V W00876 standard; Protein; 740 AA, 

AC W00876; 

DT 28-APR-1997 (first entry) 

DE c-Delta-1 polypeptide (alternatively spliced variant). 

KW C-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy, 

OS Gallus sp. 



FH 


Key 


Location/Qualifiers 


FT 


domain 


184.. 228 


FT 




/label= DSL 


FT 


domain 


229.. 261 


FT 




/label- EGF1 


FT 


domain 


262.. 292 


FT 




/label- EGF2 


FT 


domain 


293.. 332 


FT 




/label- EGF3 


FT 


domain 


333.. 370 


FT 




/label- EGF4 


FT 


domain 


371. .409 


FT 




/label- EGF5 


FT 


domain 


410.. 447 


FT 




/label- EGF6 


FT 


domain 


448.. 485 


FT 




/label- EGF7 




domain 


486.. 523 


I 




/label- EGF8 




domain 


524.. 534 






/label- EGF9 


FT 


domain 


555.. 579 


FT 




/label- TM 


FT 




/note- "transmembrane domain 


PN 


WO9701571-A1. 




PD 


16-JAN-1997. 




PF 


28-JON-1996; U11178. 


PR 


28-JON-1995; US-000589. 


PA 


(IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 


PA 


(UYYA ) (JNIV YALE. 



PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58898. . 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 2; 135pp; English, 

CC C-delta-1 polypeptide (W00876) is the chick homologue of Drosophila 

CC Delta, a protein that binds to Notch protein. Expression of 

CC C-Delta-1 correlates with onset of neurogenesis. The C-delta-1 

CC amino acid sequence was deduced from a cDNA clone (T58898) obtd. 

CC from chick stage 4-6 embryos. A shorter version (W58877) of 

CC C-Delta-1, lacking the 12 C-terminal amino acids of the longer 

CC version, was also isolated, and mouse (W11720) and human (W11721- 

CC., 38) Delta-1 polypeptides have been identified. Delta -1 proteins 



CC can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, and nervous system disorders or to 

CC promote tissue regeneration and repair. 

SQ Sequence 740 AA; 

Query Match 25.11; Score 304; DB 21; Length 740; 

Best Local Similarity 43,4%; pred. No. 2.89e-15; 

Matches 43; Conservative 21; Mismatches 29; Indels 6; Gaps 4; 

Db 436 cqcqagftgrhcddnvddcasfpcvnggtcqdgvndysctcppgyngkncstp--v-s-r 491 

1:1 :!::l :l :l III I II: I I II Ml h 11:1 I I : : ; 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 492 --cehnpchngatchersnryvcecargygglncqfllp 528 

II hill I :::H Ihl 1 : 1 1 :|: lh 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESULT 9 

ID W11720 standard; Protein; 722 AA. 

AC W11720; 

DT 28-APR-1997 (first entry) 

DE M-Delta-1 polypeptide. 

KW M-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy. 

OS Mus sp. 

PN WO9701571-A1. 

PD 16-JAN-1997. 

PF 28-JUN-1996; 011178. 

PR 28-JUN-1995; US-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58899. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Claim 4; Fig 8; 135pp; English. 

CC M-delta-1 polypeptide (W11720) is the mouse homologue of Drosophila 

CC Delta, a protein that binds to Notch protein. It is expressed 

CC primarily in presomitic mesoderm, the central and peripheral 

CC nervous systems, and kidney. Chick (W11719) and human (W11721- 

CC 38) Delta-1 polypeptides have also been identified, Delta-1 

CC proteins can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, as well as nervous system disorders, 

CC and to promote tissue regeneration and repair. 

SQ Sequence 722 AA; 

Query Match 24.9%; Score 302; DB 21; Length 722; 

Best Local Similarity 42,4%; Pred. No. 4.19e-15; 

Matches 42; Conservative 22; Mismatches 29; Indels 6; Gaps 5; 

Db 428 crcqagfsgrycednvddcasspcanggtcrdsvndfsctcppgytgkncs-ap-v-s-r 483 

I I :h:l I :l III I lh I I II :M h Ihl I :| : : : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 484 "cehapchngatchqrgqrpcecaqgyggpncqfllp 520 

II : hill I ::! I :hl hllhh lh 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESULT 10 

ID W49698 standard; Protein; 2321 AA. 
AC W49698; 

DT 21-DEC-1998 (first entry) 
DE Human Notch3 protein, 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 
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KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy; therapy. 

OS Homo sapiens. 

PN FR2751986-A1. 

PD 06-FEB-1998. 

PF 16-APR-1997; 004680, 

PR 01-AUG-1996; FR-009733 . 

PA (INRM ) INSERM INST NAT SANTE & RECH MEDICALE. 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133138/13. 

DR N-PSDB; V57001. 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig 1.1-1.8; 45pp; French. 

CC This sequence represents the human Notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of 

•the cerebral autosomal dominant arteriopathy with subcortical infarcts 
and leukoencephalopathy (CADASIL) type. Blocking expression of a 
mutated Notch3 gene or by substitution therapy with non-mutated Notch3 

CC gene or protein can be used to treat CADASIL or related disorders. 

SQ Sequence 2321 AA; 

Query Match 24,64; Score 298; DB 36; Length 2321; 

Best Local Similarity 42.7*; Pred. No. 8.83e-15; 

Matches 41; Conservative 15; Mismatches 39; Indels 1; Gaps 1; 

Db 1108 ceclpgyngdnceddvdecasqpcqhggscidlvarylcscppgtlgvlceineddcgpg 1167 

111:111 MM hi : Ihh hi I I I h I I llll :| 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 1168 ppldsgprclhngtcvdlvggfrctcppgytglrce 1203 

= 1 : I I : : III I I 1 1 : I II 
Qy 64 SPCE-GTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RES 


JLT 11 




ID 


W68510 standard; Protein; 1872 AA. 


AC 


W68510; 




DT 


06-JAN-1999 (first entry) 


DE 


Partial human Notch- 3 protein. 


KW 


Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 


KW 


developmental cascade; neurogenic gene; mutant; neurological disorder; 


KW 


cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 


KW 


leukoencephalopathy. 






Homo sapiens. 






Key Location/Qualifiers 




Misc.difference 328 






/note- 


"encoded by NAN" 


FT 


Misc_difference 401 


FT 


/note= 


"encoded by GNN" 


FT 


Misc.difference 403 


FT 


/note= 


"encoded by GNC" 


FT 


Misc.difference 406 


FT 


/note- 


"encoded by GNN" 


FT 


Misc.difference 409 


FT 


/note- 


"encoded by NNT" 


FT 


Misc.difference 420 


FT 


/note= 


"encoded by GNC" 


FT 


Misc.difference 706 


FT 


/note- 


"encoded by NNN" 


FT 


Misc.difference 708 


FT 


/note- 


"encoded by CCN" 


FT 


Misc.difference 719 




FT 


/note- 


"encoded by CGN" 


FT 


Misc.difference 728 


FT 


/note- 


"encoded by CNT" 


FT 


Misc.difference 729 


FT 


/note- 


"encoded by GTN" 


FT 


Misc.difference 759.. 789 



FT /note- "encoded by NNN" 

FT Misc.difference 1425 

FT /note- "encoded by GNA" 

PN FR2751985-A1. 

PD 06-FEB-1998. 

PF 01-AUG-1996; 009733, 

PR 01-ACG-1996; FR-009733. 

PA (INRM ) INSERM INST NAT SANTE & RECH MEDICALE. 

PI Bach jf, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133137/13. 

DR N-PSDB; V57163. 

PT Human Notch3 nucleic acids • and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig la-lg; 42pp; French. 

CC This sequence represents a partial human notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of the 

CC cerebral autosomal dominant arteriopathy with subcortical infarcts and 

CC leukoencephalopathy (CADASIL) type. Blocking expression of a mutated 

CC Notch3 gene or by substitution therapy with non -mutated Notch3 gene or 

CC protein can be used to treat CADASIL or related disorders. 

SQ Sequence 1872 AA; 

Query Match 24,0%; Score 291; DB 36; Length 1872; 

Best Local Similarity 41.7*; Pred. No. 3.24e-14; 

Matches 40; Conservative 16; Mismatches 39; Indels 1; Gaps 1; 

Db 1042 ceclpgyngdnceddvdecasqpcqhggscidlvarylcscppgtlgvfceineddcgpg 1101 

llhlll llll :: hi : 1 1 : 1 : hi I II h I I :IM :| 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 1102 ppldsgprclhngtcvdlvggfrctcppgytglrce 1137 

:| : I I : : III I I lh I II 
Qy 64 SPCE-GTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 12 

ID W05835 standard; Protein; 1193 AA. 

AC W05835; 

DT 28-JAN-1997 (first entry) 

DE Chick Serrate. 

KW C-Serrate; Notch; cell differentiation; cell fate; tissue repair; 

KW' central nervous system; cancer; therapy; diagnosis. 

OS Gallus sp. 



FH Key Location/Qualifiers 

FT domain 1 . .1041 

ft /label- Extracellular.domain 

FT peptide 1..5 
FT /label- Sig_peptide 

FT /note- "lacks the N-terminal portion owing to 

FT truncation of the encoding cDNA clone" 

FT domain 158.. 203 

FT /label- DSL 

FT /note- "region of homology with Drosophila Delta 

FT and Serrate, predicted to mediate binding 

FT with Notch" 

FT domain 208.. 837 

FT /label- ELR 

FT /note- "epidermal growth factor-like repeat domain" 

FT region ■ 208.. 238 

FT /label- ELR1 

FT region 239,, 274 

FT /label- ELR2 

FT region 275.. 313 

FT /label- ELR3 

FT region 314.. 351 

FT /label- ELR4 

FT region 352.. 390 

FT /label- ELR5 

FT region 391.. 427 

FT . /label- ELR6 
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FT 


region 


428. .464 


FT 




/label- ELR7 


FT 


region 


465.. 502 


FT 




/label- ELR8 


FT 


region 


503. .540 


FT 




/label- ELR9 


FT 


region 


541. .606 


FT 




/label- ELR10 


FT 


region 


607., 644 


FT 




/label- ELR11 


FT 


region 


655. .682 


FT 




/label- ELR12 


FT 


region 


683. .721 


FT 




/label- ELR13 


FT 


region 


722. .759 


FT 




/label- ELR14 


EI 


region 


760. .797 


m 




/label- ELR15 


■ 


region 


798. .837 






/label- ELR16 


FT 


region 


854. .911 


FT 




/label* Cysteine- rich_region 


FT 


domain 


1042.. 1066 


FT 




/label= Transmembrane domain 


FT 


domain 


1067.. 1193 


FT 




/label- Intracellularjomain 


PN 


WO9627610-A1, 




PD 


12-SEP-1996, 





PF 07-MAR-1996; D03172. 

PR 07-MAR-1995; US-400159. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 

PI Lewis JH, Mann RS, Myat AM; 

DR WPI; 96-425379/42. 

DR N-PSDB; T40092. 

PT Vertebrate Serrate protein and related DNA - used to treat or 

PT prevent malignancies characterised by increased Notch activity. 

PS Disclosure; Page 112-115; 161pp; English. 

CC Chicken Serrate (W05835), or C-Serrate, is a ligand for the zygotic 

CC neurogenic locus Notch and is believed to play a major role in 

CC determining cell fates in the central nervous system. Its amino 

CC acid sequence was deduced from a cDNA clone (T40092) obtd. from an 

CC optic explant cDNA library. C-Serrate is expressed in the central 

CC nervous system, cranial placodes, nephric mesoderm, vascular 

A system, and limb bud mesenchyme. 

B Sequence 1193 AA; 

Query Match 23.4%; Score 284; DB 19; Length 1193; 

Best Local Similarity 40.8*; Pred. No. l,19e-13; 

Matches 40; Conservative 17; Mismatches 35; Indels 6; Gaps 3; 

Db 446 rcicspgyagdhcekdinecasnpcmngghcqdeingfqclcpagfsgnlc — qldi- 500 

II I 111111:1 : ::l : I lh:| ll:|:: III: hit II :| 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 501 dy-cepnpcqngaqcfnlamdyfcncpedyegkncshl 537 

II IMII I : : I I : I :| I 
Qy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKL 100 



RESULT 13 

ID W18351 standard; protein; 1036 AA. 

AC W18351; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens, 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 



PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA' (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 
DR ' WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 5; Page 66-71; 114pp; Japanese, 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression, 

SQ Sequence 1036 AA; 

Query Match 23.3%; Score 283; DB 25; Length 1036; 

Best Local Similarity 40.8%; Pred. No. 1.43e-13; 

Matches 40; Conservative 18; Mismatches 34; Indels 6; Gaps 3 

Db 441 rcicppgyagdhcerdidecasnpclngghcqneinrfqclcptgfsgnlc----qldi- 495 

II I lllllhl : 1:1 : I ll::| :|:| : III: hi || :| 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 496 dy-cepnpcqngaqcynrasdyfckcpedyegkncshl 532 

II lllll I :::: II : I :| I 
Qy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKL 100 



RESULT 14 

ID W18352 standard; protein; 1187 AA. 

AC W18352; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

kw Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression, 

OS Homo sapiens, 

PN W09719172-A1, 

PD 29-MAY-1997, 

PF 15-NOV-1996; J03356, 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 6; Page 71-76; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells , The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 1187 AA; 

Query Match 23.3%; Score 283; DB 25; Length 1187; 

Best Local Similarity 40.8%; Pred. No. 1.43e-13; 
Matches 40; Conservative 18; Mismatches 34; Indels 6; Gaps 3 

Db 441 rcicppgyagdhcerdidecasnpclngghcqneinrfqclcptgfsgnlc----qldi- 495 

II I hhlh! : |:| : | ||::| :|:| ; llh hi II :| 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 496 dy-cepnpcqngaqcynrasdyfckcpedyegkncshl 532 

II lllll I :::: II : I :| I 
Qy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKL 100 



RESULT' 15 
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Page 



Peptide 



Protein 



W44301 standard; Protein; 1218 AA, 
W44301; 

19-JUN-1998 (first entry) 
Human serrate 1. 

Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 
leukaemia; endothelial cell; tumour, 
Homo sapiens. 

Location/Qualifiers 
1..31 

/label- Signal 
32. .1218 
/label- Serrate- 1 

WO9802458-A1. 
22-JAN-1998. 
ll-JUL-1997; J02414. 
14-MAY-1997; JP-124063. 
16-JUL-1996; JP-186220. 
(AS AH ) AS AH I KASEI KOGYO KK, 
Itoh A, Sakano S; 
WPI; 98-110528/10. 
N-PSDB; V15201. 

Human serrate-2 gene expression products - used to regulate stem 
cell differentiation, useful in treating neoplasms, e.g. leukaemia 
Disclosure; Page 77-86; 103pp; Japanese. 
The present sequence represents human serrate 1, from the present 
invention which describes human serrate 2. The present invention also 
describes a method for the preparation of the polypeptides, and 
antibodies binding to the polypeptide and its fragments , The polypeptide 
and its fragments expressed by the serrate-2-gene can be used to inhibit 
stem (especially blood stem) cell differentiation and to inhibit 
endothelial cell growth. They may be incorporated in a cell culture 
media for culturing undifferentiated stem cells. They can also be used 
for treatment of neoplasms such as leukaemia. The antibodies can be used 
for the diagnosis of malignant tumours, 
Sequence 1218 AA; 



Query Match 23.3%; 
Best Local Similarity 40.8%; 
Matches 40; Conservative 



Score 283; DB 29; Length 1218; 

Pred. No, l,43e-13; 

18; Mismatches 34; Indels 6; 



Db 472 rcicppgyagdhcerdidecasnpclngghcqneinrfqclcptgfsgnlc — qldi- 526 

II I lllllhl : 1:1 : I ll::| :|:| : III: Ml II :| 
Oy 3 RCECMPGYAGDNCSENQDDCRDHRCQHGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 527 dy-cepnpcqngaqcynrasdyfckcpedyegkncshl 563 

II UNI I :::: II : I :| I 
Oy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKL 100 



•ch completed: Fri May 28 08:53:26 1999 
time : 62 sees, 
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******************* 



Release 3.1A John F, Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K, 
Distribution rights by Oxford Molecular Ltd 

^^rch_pp protein - protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 08:53:43 1999; MasPar time 9.24 Seconds 

693.556 Million cell updates/sec 

Tabular output not generated. 

Title: >US-09-191-647-5 
Description: (1-160) from US09191647 . pep 
Perfect Score: 1212 

1 WPRCECMPGYAGDNCSENQD NGGNDHIAVXLYXGHVRFSY 160 



Scoring table: PAM 150 
Gap 11 

Searched: 122810 seqs, 40068593 residues 

Post-processing: Minimum Match 0* 

Listing first 45 summaries 

Database: . pir60 

hpirl 2:pir2 3:pir3 4 :pir4 

Statistics: Mean 10,603; Variance 78.520; scale 0.517 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



^^ul 



SUMMARIES 



4 

Query 



NO. 


Score 


Match Length D 


3 ID 


Description 


Pred. No. 


1 


491 


40,5 


530 


2 A31640 


epidermal growth fact 


3.64e-76 


2 


489 


40.3 


1469 


2 B36665 


slit protein 2 precur 


9.43e-76 


3 


489 


40.3 


1480 


2 A36665 


slit protein 1 precur 


9.43G-76 


4 


343 


28.3 


293 


2 B26637 


neurogenic repetitive 


5.00e-46 


5 


343 


28,3 


2139 


2 A35672 


crumbs protein - frui 


5,00e-46 


6 


308 


25.4 


1064 


2 A40136 


fibropellin la - sea 


4,27e-39 


7 


304 


25.1 


728 


150719 


C-Delta-1 - chicken 


2,61e-38 


8 


302 


24,9 


722 


2 148324 


DELTA-like 1 - mouse 


6.44e-38 


9 


298 


24.6 


2321 


2 S78549 


notch3 protein ■ huma 


3.91e-37 


10 


294 


24.3 


1203 


2 A49175 


Motch B protein - mou 


2.36e-36 


11 


291 


24,0 


2437 


S42612 


transmembrane protein 


9.10e-36 


12 


287 


23.7 


570 


A48836 


fibropellin C precurs 


5.47e-35 


13 


287 


23.7 


2471 


A49128 


cell-fate determining 


5,47e-35 


14 


283 


23,3 


1220 


A56136 


jagged protein precur 


3.28e-34 


15 


279 


23,0 


2318 


2 S45306 


notch 3 protein - mou 


l,95e-33 


16 


273 


22,5 


387 


2 B49175 


Motch A protein - mou 


2.83e-32 


17 


273 


22.5 


2531 


A46019 


gene Notch- 1 protein 


2.83e-32 


18 


272 


22.4 


2531 


2 S18188 


notch protein homolog 


4,41e-32 


19 


272 


22,4 


2555 


2 A40043 


notch protein homolog 


4.41e-32 


20 


269 


22.2 


2703 


A24420 


notch protein - fruit 


1.67e-31 


21 


268 


22.1 


2524 


A35844 


Xotch protein - Afric 


2.60e-31 


22 


260 


21.5 


861 


A48825 


Notch homolog Motch p 


8.92e-30 


23 


246 


20.3 


1429 


S06434 


homeotic protein lin- 


4.15e-27 



24 


241 19 


,9 1404 


2 


A36666 


solicits protein precu 


3 . 67e _ 26 


25 


241 19 


.9 1408 


2 


S16148 


S6IT3t6 plOtSin 


3.67e*26 


26 


231 19 


,1 3562 


2 


A47171 


chondroitin sulfate p 


2 . 7 9e - 24 


27 


229 18 


.9 102 


2 


B55885 


chondroitin sulfate p 


5 . 6 le- 24 


28 


229 18 


.9 2409 


2 


A60979 


vers lean precursor ■ 


5.61e-24 


29 


227 1( 


.7 832 


2 


A31246 


ucmuycuiL piuLcin uc 


1.56e-23 


30 


227 1! 


.7 833 


2 


S19087 


yciic uyiLa piULclil pi 


1 . 56e* 23 


31 


227 18 


.1 880 


2 


S00670 


gene Delta protein pr 


1.56e-23 


32 


226 18 


.6 200 


2 


A26637 




2 . 4 0e- 23 


33 


225 1! 


.6 862 


2 


S43922 


versican ■ pi^'taiiefl 


3.69e-23 


34 


225 If 


.6 1257 


2 


S28764 


ueimJL.au iai 


3.69e-23 


35 


225 11 


.6 2397 


2 


A55535 


versic&n precursor ■ 


3.69e*23 


36 


224 18 


,5 4391 


2 


A38096 


perlecan precursor - 


5!66e-23 


37 


223 18 


.4 1268 


2 


S52781 


neurocan - mouse 


8.69e-23 


38 


218 18 


.0 385 


2 


S53718 


homeotic protein dlk 


7.37e-22 


39 


218 18 


.0 385 


2 


A54785 


preadipocyte factor 1 


7.37e-22 


40 


214 17 


,7 1295 


2 


A32901 


glpl protein precurso 


4.04e-21 


41 


207 17 


.1 473 


2 


A56175 


adhesive plaque prote 


7.81e-20 


42 


207 17 


.1 1376 


2 


G00043 


osteonidogen - human 


7.81e-20 


43 


205 16 


.9 1959 


1 


AGRT 


agrin - rat 


1.81e-19 


44 


204 16 


.8 4544 


1 


S02392 


alpha-2-macroglobulin 


2.76e-19 


45 


201 16 


.6 260 


2 


A44549 


fetal antigen 1 homeo 


9.72e-19 



RESULT 
ENTRY 
TITLE 



1 



A31640 ftype fragment 

epidermal growth factor-like protein slit ■ fruit fly 
(Drosophila melanogaster) (fragment) 
ORGANISM ♦formaljiame Drosophila melanogaster 

DATE 28-Feb-1990 tsequence_revision 28-Feb-1990 ttext change 

14-Aug-1998 
ACCESSIONS A31640 
REFERENCE A31640 

♦authors Rothberg, J.M.; Hartley, D.A.; Walther, z.; 

Artavanis-Tsakonas, S, 
♦journal Cell (1988) 55:1047-1059 

♦title slit: An EGF-homologous locus of D, melanogaster involved in 

the development of the embryonic central nervous system, 
tcross-references MUID:89077533 
faccession A31640 
ftmolecule_type DNA 
tfresidues 1-530 tllabel ROT 
♦♦cross-references GB:M23543; NID:g340939; PID:g514357 
GENETICS 

tgene FlyBaseisli 

♦♦cross-references FlyBase:FBgn0003425 
♦introns ' 470/3 
CLASSIFICATION tsuperfamily EGF homology 
growth factor 



KEYWORDS 
FEATURE 

148-181 
SUMMARY 



♦domain EGF homology ♦label EGF 
♦length 530 ♦checksum 6330 



Query Match 40.5%; Score 491; DB 2; Length 530; 

Best Local Similarity 39.5%; Pred. No. 3.64e-76; 

Matches 64; Conservative 43; Mismatches 49; Indels 6; Gaps 4; 

Db 170 CDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMY 229 

I: :h | ||::| III:: ||||: hi :| | | |:: |:| || :: 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA-- 61 

Db 230 PQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLR 289 

I llh II :| II: :|:| ||: I II I |::|| :::::: |: 
Qy 62 PK - SPCEGTECQNGA - -NCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQ 118 

Db 290 TRPEANVTI-VFSSGQNGILMYDGQDAHLAVELFNGRIRVSY 330 

: hi: II:: :llll:|:| : |:|| |: |::| II 
Qy. 119 NWXRXNITLQVFTAEDNGILLYNGGNDHIAVXLYXGHVRFSY 160 
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ORGANISM 
DATE 



i authors 



journal 
ftitle 



ENTRY B36665 ttype complete 

TITLE slit protein 2 precursor • fruit fly (Drosophila 

melanogaster) 
tformaljiame Drosophila melanogaster 
30-Apr-1991 tsequence_revision 30-Apr-1991 I text change 

16-Dec-1998 
B36665 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev, (1990) 4:2169-2187 

slit; an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
f cross-references MUID: 91099665 
iaccession B36665 

tfstatus preliminary 
ffmolecule_type mRNA 
ftresidues 1-1469 It label ROT 
tfcross-references GB:X53959 
[ETICS 

;gene FlyBase :sli 

tfcross-references FlyBase : FBgn0003425 
OSSIFICATION isuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2 -glycoprotein repeat 
homology; proteoglycan carboxyl- terminal homology 



GENE' 

ff 

CIAS! 



FEATURE 




66-91 


tdomain proteoglycan amino-terminal homology flabel 






PAH1\ 


101 


124 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR1\ 


125 


148 


Idomain leucine-rich alpha-2 -glycoprotein repeat 
homology flabel LRR2\ 


149 


172 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR3\ 


173 


196 


Idomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LRR4\ 


197 


220 


Idomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LRR5\ 


228 


272 


Idomain proteoglycan carboxyl-terminal homology tlabel 
PCS1\ 


288 


313 


Idomain proteoglycan amino-terminal homology flabel 
PAH2\ 


323 


346 


Idomain leucine-rich alpha-2-glycoprotein repeat 
homology flabel LRR6\ 


347 


370 


Idomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR7\ 


371 


394 


Idomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR8\ 




418 


Idomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR9\ 


^419 


442 


Idomain leucine-rich alpha-2-glycoprotein repeat 
homology flabel LR10\ 


450 


494 


Idomain proteoglycan carboxyl-terminal homology flabel 
PCS2\ 


512 


537 ' 


Idomain proteoglycan amino-terminal homology tlabel 
PAH3\ 


547 


571 


♦domain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LRU\ 


572 


595 


Idomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LR12\ 


596 


619 


Idomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LR13\ 


620 


643 


Idomain leucine-rich alpha-2-glycoprotein repeat 
homology flabel LR14\ 


651 


695 


Idomain proteoglycan carboxyl-terminal homology tlabel 
PCS3\ 


708 


733 


Idomain proteoglycan amino-terminal homology tlabel 
PAH4\ 


743 


766 


Idomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LR15\ 


767 


790 


Idomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LR16\ 



846-890 tdomaln proteoglycan carboxyl-terminal homology flabel 

PCS4\ 

1028-1061 fdomain EGF homology flabel EGF 

SUMMARY f length 1469 f molecular-weight 164695 tchecksura 8361 

Query Match 40.3%; Score 489; DB 2; Length 1469; 

Best Local Similarity 40.5%; Pred. No. 9.43e-76; 

Matches 66; Conservative 42; Mismatches 48; Indels 7; Gaps 5; 

Db 1050 CDCOAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMY 1109 

hi :h I ll|::| MM: hi :| I I |:: hi II 

Qy 4 CECMPGYAGDNCSENQDDCRDHRCONGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA-- 61 

Db 1110 PQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLR 1169 

I llh II :| II: :hl lh I II I l::|| :::::: h 
Qy 62 PK-SPCEGTECQNGA--NCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQ 118 

Db 1170 TRPEANVT I ■ VFSSAEQNG ILMYDGQDAHLAVELFNGRI RVSY 1211 

: hh ||: Ihlllhhl : hi |: |::| I 
Qy 119 NWXRXNI TLQVFT - AEDNG ILLY NGGNDHIAVXLYXGHVRFS Y 160 



3 



ENTRY 
TITLE 



ORGANISM 
DATE 



ACCESSIONS 



tauthors 



Sjournal 
ttitle 



A36665 ttype complete 
slit protein 1 precursor - fruit fly (Drosophila 

melanogaster) 
tformaljiame Drosophila melanogaster 
30-Apr-1991 fsequencejrevision 30-Apr-1991 ttext change 

24-Sep-1998 
A36665; S13523 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross -references MUID: 91099665 
^accession A36665 

tfstatus preliminary 
tfmoleculejype mRNA 
tf residues 1-1480 tflabel ROT 
tfcross-references GB:X53959; NID:g8614; PID:g8615 
GENETICS 

tgene FlyBase:sli 
tfcross-references FlyBase :FBgn0003425 
CLASSIFICATION fsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2 -glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 



KEYWORDS 


alternative splicing 


FEATURE 




66-91 


Idomain proteoglycan amino-terminal homology flabel 




PAH1\ 


101-124 


Idomain leucine-rich alpha- 2 -glycoprotein repeat 




homology flabel LRRl\ 


125-148 


tdomain leucine-rich a lpha - 2 - g lycopr ote in repeat 




homology flabel LRR2\ 


149-172 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 




homology tlabel LRR3\ 


173-196 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 




homology flabel LRR4\ 


197-220 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 




homology flabel LRR5\ 


228-272 


tdomain proteoglycan carboxyl-terminal homology flabel 




PCS1\ 


288-313 


tdomain proteoglycan amino-terminal homology tlabel 


323-346 


PAE2\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 




homology flabel LRR6\ 


347-370 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology flabel LRR7\ 


371-394 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology flabel LRR8\ 
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395-418 
419-442 
450-494 
512-537 
547-571 
572-595 
596-619 
620-643 

• 651-695 
708-733 
743-766 
767-790 
791-814 
815-838 
846-890 

1028-1061 
/SUMMARY 



tdomain leucine-rich alpha-2-glycoprotein repeat 

homology f label LRR9\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR10\ 
tdomain proteoglycan carboxyl -terminal homology tlabel 
PCS2\ 

((domain proteoglycan amino-terminal homology ilabel 
PAH3\ 

(tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRll\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR12\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR13\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR14\ 
tdomain proteoglycan carboxyl -terminal homology tlabel 
PCS3\ 

tdomain proteoglycan amino-terminal homology tlabel 
PAH4\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR15\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR16\ 
tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LR17\ 
tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LR18\ 
tdomain proteoglycan carboxyl -terminal homology tlabel 
PCS4\ 

tdomain EGF homology tlabel EGF 
tlength 1480 tmolecular -weight 165751 tchecksum 900 

Query Match 40.3%; Score 489; DB 2; Length 1480; 

Best Local Similarity 40.54; Pred. No. 9.43e-76; 

Matches 66; Conservative 42; Mismatches 48; Indels 7; Gaps 5; 

Db 1050 CDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMY 1109 

1:1 :h I lll::| lllh |:| :| I I h: |:| II :: 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA-- 61 

Db 



t 



1110 PQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLR 1169 
I III: II :l II: :|:| lh I II I |::|| |: 
62 PK-SPCEGTECQNGA-NCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFIDLQ 118 

1170 TRPEANVII-VFSSAEQNGILMYDGQDAHLAVELFNGRIRVSY 1211 

: 1:1: lh l|:||||:|:l : 1 : 1 1 |: |::| II 
119 NWXRXNITLQVFT-AEDNGILLYNGGNDHIAVXLYXGHVRFSY 160 



RESULT 4 
ENTRY 
TITLE 



3 ttext_change 



; Tepass, 0.; Bremer, R.A.; Weigel, 



B26637 ttype fragment 

neurogenic repetitive locus 95F protein ■ fruit fly 
(Drosophila melanogaster) (fragment) 
ORGANISM tformal_name Drosophila melanogaster 

DATE 16-Aug-1988 tsequence_revision 16-Aug-l 

14-Aug-1998 
ACCESSIONS B26637 
REFERENCE A91081 

tauthors Knust, E.; Dietrich, U.; 

D.; Vaessin, H.; Campos -Ortega, J, A, 
♦journal EMBO J, (1987) 6:761-766 

ttitle EGF homologous sequences encoded in the genome of Drosophila 

melanogaster, and their relation to neurogenic genes, 
tcross-references MUID: 87218537 
taccession B26637 
ttmoleculejype mRNA 
ftresidues 1*293 ft label KNU 
t tcross-references GB:X05144; NID;g7519; PID:g929536 
GENETICS 

tgene FlyBase : crb 

ttcross-references FlyBase :FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 



KEYWORDS 
FEATURE 

216-252 
SUMMARY 



transmembrane protein 

tdomain EGF homology tlabel E 
tlength 293 tchecksum 3413 



Query Match 28.3%; Score 343; DB 2; Length 293; 

Best Local Similarity 42.7%; Pred. No. 5.00e-46; 

Matches 41; Conservative 20; Mismatches 33; Indels 2; Gaps 2 

Db 159 CQCQPGFEGQHCEQNIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTCEN 218 

1:1 II: l"l :| 1 = 1 h |:||: I I : II I |:| I I |:: :; : 
2y 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 219 EPCRNGSTCQNGFN-ASTGNNFTCTCVPGFEGPLCD 253 

II :|: IMI I II I |:||| II |: 
Jy 64 SPC-EGTECONGANCVDQGNRPVCOCLPGFGGPECE 98 



RESULT 5 

ENTRY A35672 ttype complete 

TITLE crumbs protein - fruit fly (Drosophila melanogaster) 

ORGANISM tformaljiame Drosophila melanogaster 

DATE 21-Sep-1990 tsequence_revision 18-Nov-1992 ttext change 

14-Aug-1998 
ACCESSIONS A35672 
REFERENCE A35672 

tauthors Tepass, U.; Theres, C; Knust, E. 

tjournal Cell (1990) 61:787-799 

ttitle crumbs encodes an EGF-like protein expressed on apical 

membranes of Drosophila epithelial cells and required for 
organization of epithelia, 
tcross-references MUID: 90263104 
taccession A35672 

ttstatus preliminary 
ttmolecule_type mRNA 
ttresidues 1-2139 ttlabel TEP 
ttcross-references GB:M33753 

ftnote the authors translated the codon GGC for residue 1928 as 

Cys, and TAT for residue 2023 as Gin 

GENETICS 

tgene FlyBase: crb 

ttcross-references FlyBase:FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 
transmembrane protein 



KEYWORDS 
FEATURE 

691-722 
SUMMARY 



tdomain EGF homology tlabel EGF 
■ tlength 2139 tmolecular-weight 233619 tchecksum 7230 



Query Match 28.3%; Score 343; DB 2; Length 2139; 

Best Local Similarity 42.7%; Pred. No. 5. 00e-46; 

Matches 41; Conservative 20; Mismatches 33; Indels 2; Gaps 

Db 1821 CQCQPGFEGQHCEQNIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTCEN 1881 

hi II: h:| :l hi h hlh I I : II I hi I I h: :: : 
3y 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 1881 EPCRNGSTCQNGFN-ASTGNNFTCTCVPGFEGPLCD 1915 

' II :h INI I lh I hill II h 
2y 64 SPC - EGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 6 
ENTRY 
TITLE 

ALTERNATE_NAMES epidermal growth factor homolog precursor 



A40136 ttype complete 

fibropellin la • sea urchin (Strongylocentrotus purpuratus) 



CONTAINS 
ORGANISM 



alternatively spliced fibropellin lb (EGFI) 
tformaljame Strongylocentrotus purpuratus tcommon.name 
purple urchin 

13-May-1992 tsequence_revision 17-Sep-1997 ttext change 

07-Aug-1998 
A40136; B40136; C40136; A29316; A43131 
A40136 

Delgadillo-Reynosc.M.G.; Rollo, D.R.; Hursh, D.A.; Raff, 
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R.A. 

•journal J. Hoi. Evol, (1989) 29:314-327 

♦title Structural analysis of the uEGF gene in the sea urchin 

Strongylocentrotus purpuratus reveals more similarity to 
vertebrate than to invertebrate genes with EGF-like 



tcross -references MUID: 90112459 
((accession A40136 

ifstatus preliminary 

tfmolecule.type mRNA 

ttresidues 1-114 ttlabel DEL 

ttcross-references GB:X17530; NID:gl0225; PID:g667061 
taccession B40136 

tistatus preliminary; not compared with conceptual translation 

ttmolecule_type DNA 

^residues 181-251, 329-370, 'R' , 372-408, 'RA' ,411-441 filabel DE2 
taccession C40136 
•♦status preliminary; not compared with conceptual translation 
tSmolecule_type DNA 

ttresidues 'R', 747-821, 898-978 iflabel DE3 
REFERENCE A29316 

^tauthors Hursh, D.A.; Andrews, m.e.; Raff/ R.A. 
^t journal Science (1987) 237:1487-1490 

((title A sea urchin gene encodes a polypeptide homologous to 

epidermal growth factor, 
((cross -references MUID: 87319677 
taccession A29316 

ttstatus preliminary 
ttmolecule_type mRNA 

ttresidues 'S\ 280-481, 786-1064 ttlabel HUR 
ttcross-references GB:M17421; NID:gl61474; PID;g552260 
A43131 

•authors Hunt, L.T.; Barker, w.C. 
•journal FASEB J. (1989) 3:1760-1764 

•title Avidin-like domain in an epidermal growth factor homolog from 

a sea urchin, 
•cross-references MUID : 89196806 
•contents annotation 
COMMENT EGF homology repeats 10-17 are spliced out in the short form 
(fibropellin lb), 

CLASSIFICATION tsuperfamily Clr/Cls repeat homology; EGF homology 
FEATURE 

1-19 tdomain signal sequence tstatus predicted t label SIG\ 

20-1064 tproduct fibropellin I tstatus predicted tlabel FIB\ 

23-54 tdomain EGF homology tlabel EG01\ 

57-175 tdomain Clr/Cls repeat homology tlabel CSR\ 

180-211 tdomain EGF homology tlabel EG02\ 

218-249 tdomain EGF homology tlabel EG03\ 

256-287 tdomain EGF homology tlabel EG04\ 

•294-325 tdomain EGF homology tlabel EG05\ 

332-363 tdomain EGF homology tlabel EG06\ 

370-401 tdomain EGF homology tlabel EG07\ 

408-439 tdomain EGF homology tlabel EG08\ 

446-477 tdomain EGF homology tlabel EG09\ 

484-515 tdomain EGF homology tlabel EG10\ 

522-553 tdomain EGF homology tlabel E611\ 

. 560-591 tdomain EGF homology tlabel EG12\ 

598-629 tdomain EGF homology tlabel EG13\ 

636-667 tdomain EGF homology tlabel EG14\ 

674-705 tdomain EGF homology tlabel EG15\ 

712-743 tdomain EGF homology tlabel EG16\ 

750-781 tdomain EGF homology tlabel EG17\ 

788-819 tdomain EGF homology tlabel EG18\ 

826-857 tdomain EGF homology tlabel EG19\ 

864-895 tdomain EGF homology tlabel EG20\ 

902-933 tdomain EGF homology tlabel EG21\ 

936-1064 tregion avidin-like\ 

23-34,28-43,45-54, 
62-88,180-191, 
185-200,202-211, 
218-229,223-238, 
240-249,256-267, 
261-276,278-287, 



294 


305 299 


314 


316 


325,332 


343 


337 


352 354 


363 






390, 


392 


401,408 




413 




439' 


446 


457 451 


466 


468 


477 484 


495 


489 


jUu 




522 


533 527 


542 


544 


553,560 


571, 


565 


KOfl eon 


591 


598 




618 


620 


629,636 


647 


641 


QjO, vjG 


667, 


674 


685,679 


694, 


696 


705,712 


723 


717 


732,734 


743, 


750 


761,755 


770, 


772 


781,788 


799, 


793 


808,810 


819, 


826 


837,831 


846, 


848 


857,864 


875, 


869 


884,886 


895, 


902 


913,907 


922, 


924 


933 




SUMMARY 


tl 



tdisulfidejoonds tstatus predicted\ 



fdisulfideJ>onds tstatus predicted 



Query Match 25.4%; Score 308; DB 2; Length 1064; 

Best Local Similarity 46.3%; Pred, No. 4.27e-39; 

Matches 44; conservative 17; Mismatches 28; Indels 6; Gaps 2; 

Db 808 CACVPGFTGSNCETNIDECASDPCLNGGICVDGVNGFVCQCPPNYSGTYCEIS--LDA- 863 

I II 1 M I II: M II:: I |: III III: I I 

Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 864 --CRSMPCQNGATCVNVGADYVCECVPGYAGQNCE 896 

I : lllll II: I 1 1 : 1 : 1 1 : : I :|| 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 7 

ENTRY 150719 ttype complete 

TITLE C-Delta-1 - chicken 

ORGANISM tformaljiame Gallus gallus tcommon_name chicken 

DATE 13-Sep-1996 tsequence_revision 13-Sep-1996 ttext change 

14-Aug-1998 
ACCESSIONS 150719 
REFERENCE 150719 

tauthors Henrique, D.; Adam, J.; Myat, A.; Chitnis, A.; Lewis, J.; 
ish-Horowicz, D. 

•journal Nature (1995) 375:787-790 

•title Expression of a Delta homologue in prospective neurons in the 
chick . 

•cross-references MUID: 95319507 
taccession 150719 

Ifstatus preliminary; translated from GB/EMBL/DDBJ 
ttmolecule.type mRNA 
ttresidues 1-728 ttlabel HEN 
ttcross-references EMBL:U26590; NID:g882411; PID:g882412 
CLASSIFICATION tsuperfamily EGF homology 
FEATURE 

454-485 tdomain EGF homology tlabel EGF 

SUMMARY tlength 728 tmolecular-weight 79861 tchecksum 1765 

Query Match 25,1%; Score 304; DB 2; Length 728; 

Best Local Similarity 43.4%; Pred. No. 2.61e-38; 

Matches 43; Conservative 21; Mismatches 29; Indels 6; Gaps 4; 

Db 436 CQCQAGFTGRHCDDNVDDCASFPCVNGGTCQDGVNDYSCTCPPGYNGKNCSTP - - V- S • R 491 

1:1 :|::| :| :| III MM II III |: hi I I : ; : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 
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Db 492 --CEHNPCHNGATCHERSNRYVCECARGYGGLNCQFLLP 528 

II hill I Ihl hll :|: II: 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESOLT 
ENTRY 
TITLE 
ORGANISM 
DATE 



ACCESSIONS 
REFERENCE 
fauthors 

♦journal 
kititle 



148324 #type complete 
DELTA- like 1 - mouse 

tformal_name Mus musculus tcommonjiame house mouse 
02-M-1996 ♦sequence.revision 02- Jul-1996 ♦text change 

28-Feb-1997 
148324 
148324 

Bettenhausen, b.; de Angel is , M.H.; Simon, D,; Guenet, J.L.; 

Gossler, A, 
Development (1995) 121:2407-2418 

•♦title Transient and restricted expression during mouse 
embryogenesis of Dill, a murine gene closely related to 
Drosophila Delta, 
♦cross-references MUID:95401858 
♦accession 148324 

♦♦status preliminary; translated from GB/EMBL/DDBJ 
♦♦molecule.type mRNA 
##residues 1-722 Mabel RES 
♦♦cross-references EMBL:X80903; NlD:g806569; PID:g806570 
GENETICS 

♦gene Dill 
SUMMARY tlength 722 tolecular -weight 78448 ♦checksum 1452 

Query Match 24.9%; Score' 302; DB 2; Length 722; 

Best Local Similarity 42.44; Pred. No. 6.44e-38; 

Matches 42; Conservative 22; Mismatches . 29; Indels 6; Gaps 5; 

Db 428 CRCQAGFSGRYCEDNVDDCASSPCANGGTCRDSVNDFSCTCPPGYTGKNCS-AP-V-S-R 483 

I I :h:l I :l III I II: I I II :|| |: ||:| I :| : : : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 484 --CEHAPCHNGATCHQRGQRYMCECAQGYGGPNCQFLLP 520 

II : 1:111 I ::l I :|:| hllhl: ||: 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESULT 
ENTRY 
SJLE 
INISM 



♦authors 

♦submission 

^accession 



S78549 tttype complete 
notch3 protein • human 
♦formaljiame Homo sapiens ♦commonjiame man 
24-Jul-1998 tsequence_revision 24-Jul-1998 ttext change 

17-Mar-1999 
S78549; S71825 
S78549 

Joutel, A.; Tournier-Lasserve, E. 
submitted to the EMBL Data Library, April 1997 
S78549 
#tmolecule_type mRNA 
♦♦residues 1-2321 itlabel JOUl 
(ttcross-references EMBL:U97669; NID:g2668591; PID:g2668592 
REFERENCE S71825 

♦authors Joutel, A.; Corpechot, C; Ducros, A.; Vahedi, K.; Chabriat, 
H.; Mouton, P.; Alamowitch, S.; Domenga, V,; Cecillion, M.; 
Marechal, E.; Maciazek, J.; Vayssiere, C; Cruaud, C; 
Cabanis, E.A.; Ruchoux, M.M.; Weissenbach, J.; Bach, J.F.; 
Bousser, M.G.; Tournier-Lasserve, E. 
♦journal Nature (1996) 383:707-710 

♦title Notch3 mutations in CADASIL, a hereditary adult-onset 

condition causing stroke and dementia, 
♦cross-references MUID:97032728 
♦accession S71825 

♦♦status nucleic acid sequence not shown 

♦♦molecule.type DNA 

♦♦residues 67-113 ; 138-194 ; 268-333, 'G' , 335-346r536-613 ; 716-765; 

1240-1279,-1815-1888 ♦♦label JOU2 
♦♦cross-references EMBL:U97669 
GENETICS' 



♦gene 

♦map_position 
FUNCTION 
♦description 

CLASSIFICATION 

KEYWORDS 
FEATURE 

318-349 

1838-1870 

1871-1903 

1905-1937 

1938-1970 

1971-2003 
SUMMARY 



notch3 
19pl3.1 

may be involved in pathogenesis of CADASIL, causing a type of 

stroke and dementia 
♦superfamily unassigned ankyrin repeat proteins; ankyrin 

repeat homology; EGF homology 
tandem repeat; transmembrane protein 

♦domain EGF homology flabel EGF\ 

♦domain ankyrin repeat homology Uabel AN1\ 

♦domain ankyrin repeat homology I label AN2\ 

♦domain ankyrin repeat homology ♦label AN3\ 

♦domain ankyrin repeat homology ♦label AN4\ 

♦domain ankyrin repeat homology * label AN5 

♦length 2321 ♦molecular -weight 243657 ♦checksum 3337 



Query Match 24.6%; Score 298; DB 2; Length 2321; 

Best Local Similarity 42.7%; Pred. No. 3.91e-37; 

Matches 41; Conservative 15; Mismatches 39; Indels 1; 



1; 



Db 



1108 CECLPGYNGDNCEDDVDECASQPCQHGGSCIDLVARYLCSCPPGTLGVLCEINEDDCGPG 1167 
111:111 Nil :: hi : Ihh |:| I | | |: | I |||| :| 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 1168 PPLDSGPRCLHNGTCVDLVGGFRCTCPPGYTGLRCE 1203 

= 1 : I I : : III -II lh I II 
Qy 64 SPCE-GTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



ENTRY A49175 ♦type fragment 

TITLE Motch B protein - mouse (fragment) 

ALTERNATEJAMES Notch homolog 

ORGANISM ♦formaljiame Mus musculus tcommonjiame house mouse 

DATE 21-Jan-1994 ♦sequence_revision 05 : Jan-1996 ♦text change 

14-Aug-1998 
A49175; PH1570; S32113 
A49175 

Lardelli, M. ; Lendahl, n. 
Exp. Cell Res. (1993) 204:364-372 
Motch A and Motch B--two mouse Notch homologues coexpressed 
in a wide variety of tissues, 
♦cross-references MUID: 93178563 
♦accession A49175 

♦♦status preliminary; nucleic acid sequence not shown 

♦tmolecule.type mRNA 

♦♦residues 1-1203 ♦♦label LAR 

♦♦cross-references EMBL:X6B279; NID:g287989; PID:g287990 

♦♦experimental_source embryo 

♦♦note ' sequence extracted from NCBI backbone (NCBIP: 126158) 

COMMENT This protein has many EGF repeats and lin-12/Notch repeats. 
COMMENT This protein is one of the neurogenic proteins controlling the 

decision between ectodermaland neural fate for cells in the early 

embryo. 

CLASSIFICATION ♦superfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 



♦authors 
♦journal 
♦title 



FEATURE 

560-591 
SUMMARY 



♦domain EGF homology ♦label EGF 
♦length 1203 ♦checksum 910 



Query Match 24.3%; Score 294; DB 2; Length 1203; 

Best Local Similarity 44,8%; Pred. No. 2.36e-36; 

Matches 43; Conservative 17; Mismatches 31; Indels 5; Gaps 4 

Db ■ 855.RCECVPGYQGVNCEYEVDECQNQPCQNGGTCIDLVNHFKCSCPPGTRGLLCE-ENI-D- 910 

lllhlll I II : hi::: ||||: hi II : I h I I III :: 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 911 ECA-GGPHCLNGGQCVDRIGGYTCRCLPGFAGERCE 945 

: I I lh III: hllllhl II 
Qy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 
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RESULT 
ENTRY 
TITLE 
ORGANISM 
DATE 



11 



S42612 ♦type complete 
transmembrane protein precursor ■ zebra fish 
♦formaljiame Brachydanio rerio Hcoramonjiame zebra fish 
20-Feb-1995 ♦sequence jrevis ion 20-Feb-1995 #text change 
lO-Jul-1998 
ACCESSIONS S42612 
REFERENCE S42612 

fauthors Bierkamp, C; Campos -Ortega, J, A. 
tjournal Mech. Dev, (1993) 43:87-100 

♦title A zebrafish homologue of the Drosophila neurogenic gene Notch 
and its pattern of transcription during early 
embryogenesis . 
♦cross -references MUID: 94128602 
♦accession S42612 

♦♦status preliminary 
♦♦molecule Jiype mRNA 
tfresidues 1-2437 itlabel BIE 
♦♦cross -references EMBL:X69088; NID;g433866; PID:g433867 
OSSIFICATION Isuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 



CjflS S: 



"1915-1947 

1948-1980 
1982-2014 
2015-2047 
2048-2080 
SUMMARY 



♦domain ankyrin repeat homology Ilabel AN1\ 
♦domain ankyrin repeat homology Ilabel AN2\ 
♦domain ankyrin repeat homology Itlabel AN3\ 
♦domain ankyrin repeat homology ilabel AN4\ 
♦domain ankyrin repeat homology Itlabel AN5 
♦length 2437 tmolecular -weight 262306 ♦checksum 4021 



Query Match 24.0%; Score 291; DB 2; Length 2437; 

Best Local Similarity 44.8%; Pred, No, 9 . 10e-36; 

Matches 43; Conservative 15; Mismatches 31; Indels 7; Gaps I 

Db 474 HCICMPGYEGVFCQINSDDCASQPCLNG-KCIDKINSFHCECPKGFSGSLCQV--DV-D- 528 

:| MNI I I I III : I II l:| :||: I |: |:|| ||:: ; 
Oy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 529 -E-CASTPCKNGAKCTDGPNKYTCECTPGFSGIHCE 562 

I :l I llhl I I: hi ll|:| II 
0y 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 
ENTRY 
TITLE 



12 



ACCESSIONS 



A48836 #type complete 

fibropellin C precursor - sea urchin (Strongylocentrotus 
purpuratus) 

ALTERNATEJAMES EGF repeat-containing protein; epidermal growth 

•factor-related protein 3; fibropellin III 
NISM # formaljiame Strongylocentrotus purpuratus ♦common_name 

purple urchin 
01-Dec-1993 ♦sequence.revision 18-Nov-1994 *text change 

07-Aug-1998 
A48836 
A48836 

Bisgrove, B.W.; Raff, r.a. 
Dev. Biol. (1993) 157:526-538 

The SpEGF in gene encodes a member of the fibropellins: EGF 
repeat-containing proteins that form the apical lamina of 
the sea urchin embryo, 
♦cross-references MOID: 93273088 
♦accession A48836 

♦♦status preliminary 
»fmolecule_type mRNA 
♦♦residues 1-570 Itlabel BIS 
♦♦cross-references GB:L07045; NID:g310659; PID;g310660 



♦authors 
♦journal 
♦title 



♦♦note 

CLASSIFICATION 
FEATURE 

1-18 

19-570 

19-54 



sequence extracted from NCBI backbone (NCBIN:132724, 
NCBIP;132725) 
♦superfamily Clr/Cls repeat homology; EGF homology 

tdomain signal sequence ♦status predicted Ilabel SIG\ 
♦product fibropellin C ((status predicted ♦label FIB\ 
♦domain EGF homology Uabel EGF1\ 



57-175 




176-211 




214-249 




252-287 




290-325 




328-363 




366-401 




404-439 




442-570 




23-34,28-43,45-54 


62-88,180-191, 


185-200,202 


211, 


218-229,223 


238, 


240-249,256 


267, 


261-276,278 


287, 


294-305,299 


314, 


316-325,332 


343, 


337-352,354 


363, 


370-381,375 


390, 


392-401,408 


419, 


413-428,430 


439 



♦domain Clr/Cls repeat homology ilabel C1R\ 

♦domain EGF homology ♦label EGF2\ 

♦domain EGF homology ♦label EGF3\ 

♦domain EGF homology t label EGF4\ 

♦domain EGF homology ♦label EGF5\ 

♦domain EGF homology flabel EGF6\ 

♦domain EGF homology ♦label EGF7\ 

♦domain EGF homology ♦label EGF8\ 

♦region avidin-like\ 



SUMMARY 



♦disulfidejonds ♦status predicted 
♦length 570 tolecular -weight 61115 ♦checksum 5567 



Query Match 23.7%; Score 287; DB 2; Length 570; 

Best Local Similarity 42.1%; Pred, No. 5.47e-35; 

Matches 40; Conservative 18; Mismatches 31; Indels 6; Gaps 2; 

Db 314 CDCRAGFTGSNCETNINECASSPCLNGGSCLDGVDGYVCQCLPNYTGTHCEIS--LDA-- 369 

1:1 :h:l II I -I I 11 = hi l::l I I |:| III: I I 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 370 -CASLPCQNGGVCTNVGGDYVCECLPGYTGINCE 402 

I : llll: I : I IMIIh I :|| 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



ENTRY A49128 ttype complete 

TITLE cell -fate determining gene Notch2 protein - rat 

ORGANISM ♦formaljiame Rattus norvegicus Icommonjiame Norway rat 

DATE 21-Jan-1994 tsequencejrevision 18-Nov-1994 itext change 

14-Aug-1998 
ACCESSIONS A49128 
REFERENCE A49128 

♦authors Weinmaster, G.; Roberts, V.J.; Lemke, G, 
♦journal Development (1992) 116:931-941 
♦title, Notch2: a second mammalian Notch gene, 
♦cross-references MUID: 93202015 
♦accession A49128 

♦♦status preliminary; not compared with conceptual translation 

tlmolecule.type mRNA 

♦♦residues 1-2471 ♦♦label WEI 

t taper intentaljource Schwann cell 

sequence extracted from NCBI backbone (NCBIP:127811) 
♦superfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 



♦♦note 
CLASSIFICATION 

FEATURE 
1029-1060 
1876-1908 
1909-1941 
1943-1975 
1976-2008 
2009-2041 

SUMMARY 



♦domain EGF homology ♦label EGF\ 

♦domain ankyrin repeat homology ilabel ANl\ 

♦domain ankyrin repeat homology Ilabel AN2\ 

♦domain ankyrin repeat homology Ilabel AN3\ 

♦domain ankyrin repeat homology ♦label AN4\ 

♦domain ankyrin repeat homology ♦label AN5 

♦length 2471 taolecular-weight 265367 ichecksum 5929 



Query Match 23,7%; Score 287; DB 2; Length 2471; 

Best Local Similarity 45.4%; Pred, No. 5.47e-35; 

Matches 44; Conservative 17; Mismatches 29; Indels 7; Gaps 5; 

Db 1172 RCECVPGYQGVNCEYEVDECQNQPCQNGGTCIDLVNHFKCSCPPGTRGLLCE--ENI-D- 1227 

Hlhlll I II : hi::: |||; |:| | : I |: I [ ||| :: 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 
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Db 1228 -D-CAGAPHCLNGGQCVDRIGGYSCRCLPGFAGERCE 1262 

I I: I Ih III: hllllhl II 
Qy 63 KSPCEGT-ECQNGANCVDQGNRPVCOCLPGFGGPECE 98 



RESULT 
ENTRY 
TITLE 
ORGANISM 
DATE 



14 



A56136 itype complete 
jagged protein precursor - rat 
#formal_name Rattus norvegicus tcommonjiame Norway rat 
28-Apr-1995 isequence revision 28-Apr-1995 I text change 
ll-Aug-1995 
ACCESSIONS A56136 
REFERENCE A56136 

tauthors Lindsell, C.E.; Shawber, C.J.; Boulter, J.; Weinmaster, G. 

tjournal Cell (1995) 80:909-917 

f title Jagged: a mammalian ligand that activates Notchl, 

•tcross -references MUID: 95211842 
taccession A56136 
ttstatus preliminary 
ttmolecule_type mRNA 
ttresidues 1-1220 ftlabel LIN 
ttcross-references GB:L38483 
SUMMARY flength 1220 f molecular -weight 134528 ((checksum 2746 



Query Match 23.3%; 
Best Local Similarity 40.8%; 
Matches 40; Conservative 



Score 283; DB 2; Length 1220; 

Pred. No. 3.28e-34; 

18; Mismatches 34; Indels 6; 



473 RCICPPGYAGDHCERDIDECASNPCLNGGHCQNEINRFQCLCPTGFSGNLC-— QLDI- 527 
II I i 1 1 r 1 1 1 1 : |:| : | ||::| :l:| : Ml: |:|| I :| 
3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 



Db 528 DY-CEPNPCQNGAQCYNRASDYFCKCPEDYEGKNCSHL 564 

II lllll I :::: II : I :| I 
Qy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKL 100 



510 DECASTPCRNGAKCVDQPDGYECRCAEGFEGTLCER 545 

I :| I : I M : 1 1 1 1 : |:| || I ||: 
64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEK 99 



Search completed: Fri May 28 08:54:12 1999 
Job time : 29 sees, 



RESULT 
ENTRY 
TITLE 
ORGANISM 
DATE 



15 



S45306 Itype complete 
notch 3 protein - mouse 

fformaljiame Mus musculus tcommonjiame house mouse 
20-Feb-1995 isequence_revision 20-Feb-1995 itext change 
10-Jul-1998 
ACCESSIONS S45306 

CRENCE S45306 
authors Lardelli, M. ; Dahlstrand, J,; Lendahl, U. 
journal Mech. Dev. (1994) 46:123-136 
fftitle The novel Notch homologue mouse Notch 3 lacks specific 
epidermal growth factor-repeats and is expressed in 
proliferating neuroepithelium, 
tcross -references MUID: 95001556 
taccession S45306 

ttstatus preliminary 
ttmoleculejype mRNA 
ttresidues 1-2318 ttlabel LAR 
ttcross-references EMBL:X74760; NID:g483580; PID:g483581 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 



FEATURE 
1839-1871 
1872-1904 
1906-1938 
1939-1971 
1972-2004 

SUMMARY 



tdomain ankyrin repeat homology tlabel AN1\ 

tdomain ankyrin repeat homology tlabel AN2\ 

tdomain ankyrin repeat homology tlabel AN3\ 

tdomain ankyrin repeat homology tlabel AN4\ 

tdomain ankyrin repeat homology tlabel AN5 
tlength 2318 tmolecular -weight 244245 tchecksum 9358 



Query Match 23.0*; Score 279; DB 2; Length 2318; 

Best Local Similarity 41.71; Pred. No. 1.95e-33; 

Matches 40; Conservative 20; Mismatches ' 30; Indels f 



Db 456 CICMAGFTGTYCEVDIDECQSSPCVNGGVCKDRVNGFSCTCPSGFSGSMC---QLDV" 509 

I ll:h:| I : |:|: I II: hi lh:ll I: hi :| :| 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 
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WW 1 L_ 
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LI U 1 1 


LI LI 



LI (TM) 



**************** 



Run 



Release 3.1A John f. Collins, Blocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 



3:54:30 1999; MasPar time 6.51 Seconds 

694,790 Million cell updates/sec 



:rch_pp protein - protein database search, using Smith-Waterman algorithm 

Fri May 2 

Tabular output not generated, 

Title: >US-09-191-647-5 
Description: (1-160) from US09191647 . pep 
Perfect Score: 1212 

1 WPRCECMPGYAGDNCSENQD NGGNDHIAVXLYXGHVRFSY 160 



Scoring table: PAM 150 
Gap 11 

Searched: 77977 seqs, 28268293 residues 

Post-processing: Minimum Match 0% 

Listing first 45 summaries 

Database: swiss-prot37 
l:s 



i-stics: Mean 41.676; Variance 70.700; scale 0.589 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



ESUlt 
No. 


Score 


Query 

Match Length 


DB 


ID 


Description 


Pred. No. 


1 


489 


40.3 


1480 


1 


SLIT_DROME 


SLIT PROTEIN PRECURSOR 


3.30e-86 


2 


343 


28.3 


2139 


1 


CRB.DROME 


CRUMBS PROTEIN PRECURS 


1.92e-52 


3 


308 


25.4 


1064 


1 


FBPlJTRPU 


FIBROPELLIN I PRECURSO 


1.40e-44 


4 


304 


25.1 


723 


1 


DLL1JUMAN 


DELTA- LIKE PROTEIN 1 P 


1.09e-43 


5 


302 


24,9 


722 


1 


DLL1J0USE 


DELTA-LIKE PROTEIN 1 P 


3.04e-43 


6 


295 


24.3 


714 


1 


DLLlJAT 


DELTA- LIKE PROTEIN 1 P 


1.09e-41 


7 


291 


24.0 


2437 


1 


NOTCJRARE 


NEUROGENIC LOCUS NOTCH 


8.36e-41 


8 


287 


23.7 


570 


1 


FBP3JTRPU 


FIBROPELLIN C PRECURSO 


6.39e-40 


9 


279 


23.0 


2318 


1 


NTC3JOUSE 


NEUROGENIC LOCUS NOTCH 


3.68e-38 


10 


273 


22,5 


2531 


1 


NTC1JOUSE 


NEUROGENIC LOCUS NOTCH 


7.61e-37 


U 


272 


22.4 


2444 


1 


NTC1JUMAN 


NEUROGENIC LOCUS NOTCH 


1.26e-36 


12 


272 


22.4 


2531 


1 


NTC1_RAT 


NEUROGENIC LOCUS NOTCH 


1.26e-36 


13 


269 


22.2 


1964 


1 


NTC4.MOUSE 


NEUROGENIC LOCUS NOTCH 


5.69e-36 


14 


269 


22.2 


2703 


1 


NOTCJROME 


NEUROGENIC LOCUS NOTCH 


5.69e-36 


15 


268 


22.1 


2524 


1 


NOTCJENLA 


NEUROGENIC LOCUS NOTCH 


9,41e-36 


16 


246 


20.3 


1429 


1 


LI12_CAEEL 


LIN-12 PROTEIN PRECURS 


5.44e-31 


17 


241 


19.9 


1408 


1 


SERR_DROME 


SERRATE PROTEIN PRECUR 


6.42e-30 


18 


231 


19.1 


3562 


1 


PGCV_CHICK 


VERSICAN CORE PROTEIN 


8.66e-28 


19 


229 


18.9 


3396 


1 


PGCVJUMAN 


VERSICAN CORE PROTEIN 


2,30e-27 


20 


227 


18.7 


880 


1 


DLJROME 


NEUROGENIC LOCUS DELTA 


6.08e-27 


21 


225 


18.6 


862 


1 


PGCVJACNE 


VERSICAN CORE PROTEIN 


1.61e-26 


22 


225 


18.6 


1257 


1 


PGCNJAT 


NEUROCAN CORE .PROTEIN 


1.61e-26 


23 


225 


18.6 


3358 


1 


PGCV.MODSE 


VERSICAN CORE PROTEIN 


1.61e-26 



24 


224 18.5 


4393 


PGBMjjUMAN 


BASEMENT MEMBRANE -SPEC 


2,61e 


26 


25 


223 18^4 


1268 


PGCN_MOUSE 


NEUROCAN CORE PROTEIN 


4.24e 


26 


26 


218 18.0 


385 


DLKJ40USE 


DELTA-LIKE PROTEIN PRE 


4.76e 


25 


27 


214 17,7 


1295 


GLPl CAEEL 


GLP-1 PROTEIN PRECURSO 


3,27e 


24 


28 


207 17.1 


1376 


NID2 HUMAN 


NID0GEN-2 PRECURSOR (N 


9.31e 


23 


29 


205 16.9 


1959 


AGRI RAT 


AGRIN PRECURSOR. 


2.41e 


22 


30 


204 16.8 


4544 


LRPlJJUMAN 


LOW-DENSITY LIPOPROTEI 


3.88e 


22 


31 


201 16.6 


383 


DLKJUMAN 


DELTA-LIKE PROTEIN PRE 


1.61e 


21 


32 


198 16.3 


5147 


FAT_DROME 


CADHERIN- RELATED TUMOR 


6.65e 


21 


33 


193 15.9 


3707 


PGBMJ40USE 


BASEMENT MEMBRANE -SPEC 


6,98e 


20 


34 


188 15,5 


1955 


AGRI.CHICK 


AGRIN PRECURSOR. 


7.21e 


19 


35 


181 14,9 


432 


NEL2JIAT 


NEL-LIKE PROTEIN (FRAG 


1.85e 


17 


36 


179 14.8 


2109 


pgcaIchick 


AGGRECAN CORE PROTEIN 


4.63e 


17 


37 


178 14,7 


515 


APXl.CAEEL 


APX-1 PROTEIN PRECURSO 


7.33e 


17 


38 


172 14,2 


883 


PGCBJAT 


BREVICAN CORE PROTEIN 


1.13e 


15 


39 


171 14.1 


564 


L PGCA.CANFA 


AGGRECAN CORE PROTEIN 


1.79e 


15 


40 


171 14.1 


883 


L PGCBJ10USE 


BREVICAN CORE PROTEIN 


l,79e 


15 


41 


165 13.6 


4543 


LRPl.CHICK 


LOW- DENSITY LIPOPROTEI 


2,67e 


14 


42 


164 13.5 


886 


EMR1JUMAN 


CELL SURFACE GLYCOPROT 


4.18e 


14 


43 


161 13,3 


428 


NEL2JUMAN- 


NEL-LIKE PROTEIN (FRAG 


1.60e 


13 


44 


161 13,3 


2871 


PBN1JBOVIN 


FIBRILLIN 1 PRECURSOR 


1.60e 


13 


45 


160 13,2 


2871 


FBNlJOUSE 


FIBRILLIN 1 PRECURSOR. 


2.49e 


13 



RESULT 1 

ID SLIT.DROME STANDARD; PRT; 1480 AA, 

AC P24014; 

DT 01-MAR-1992 (REL. 21, CREATED) 

DT 01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE SLIT PROTEIN PRECURSOR. 

GN SLI, 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91099665. 

RA ROTHBERG J.M., JACOBS J.R., GOODMAN C.S., ARTAVANIS'TSAKONAS S.; 

RT "Slit: an extracellular protein necessary for development of midline 

RT glia and commissural axon pathways contains both EGF and LRR 

RT domains."; 

RL GENES DEV. 4:2169-2187(1990). 

CC -I- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

CC COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

CC MATRIX MOLECULES. 

CC ■!• TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 
CC EVENTUALLY DISTRIBUTED ALONG THE AXONS, 

CC ■!• ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

CC BY 11 AA AT THE C-TERMINUS OF THE LAST EGF REPEAT, 

CC •!• SIMILARITY: CONTAINS 7 EGF-LIKE DOMAINS, 

CC -!• SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 22. TWO BLOCK OF 6 LRR'S 

CC AND TWO BLOCKS OF 5 LRR'S, 

CC -!- SIMILARITY: CONTAINS A C- TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK), 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch). 

CC 

DR EMBL; X53959; G8615; -. 

DR PIR; A36665; A36665, 

DR FLYBASE; FBgn0003425; sli, 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF_1; 7, 

DR PROSITE; PS01185; CTCK_1; 1. 
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DR PROSITE; PS01186; EGF_2; 5. 

DR PROSITE; PS01187; EGF CA; 2. 

DR PROSITE; PS01225; CTCK.2; 1. 

DR PFAM; PF00007; Cysjnot; 1. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PFQ0054; laminin.G; 1. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUC I NE - REPEAT ; DUPLICATION, 



FT 


SIGNAL 


1 


36 




FT 




37 


1480 


CI TT DDftTPTH 


FT 


DOMAIN 


70 


104 


nnucPRVPn h-pt iwTMft bwtcw ftp tup tdd 


FT 


DOMAIN 


105 


230 


T.FnCTKIP-RTrU DPDPATC /1CT BPI7TftM\ 


FT 


nAMATN 
UUHnlN 


231 


294 


PftMCPBVPn ft.PT &MVTHft DPftTftV ftP TUP TDD 




nnMATN 

UUflnlli 


295 


326 


PftMCPDUPn M-PT RWTWft DPftTftW ftp TUP T DT5 

UUNdLRYU) N rLANfUNlj RfcAjlUN Or lab LRR. 


PT 








TPTTPTMP-DTftU DPDPRTO / TVTR DPPTftM\ 

LLULlNt'KILH RbPtAia [ZaU REGION), 


PT 


nftMATN 


453 


MR 


ftftMCPDVPA ft-PT" RKTVTXTft DPrTftkT ftP TUP TDD 

UJNoLKVU) L rLANrUNb RrMUN Or lnh LRR. 




TlftMATM 


519 


550 


CONSERVED N-FLANKING REGION OF THE LRR. 


FT 


"IftMATH 


551 


653 


T PnPTMP-DTfU DPOP&TC /"lOft DPftTftU\ 
liLULinti Klttl KCrfi/US \iKU KtblUnJ , 




nftMATM 


654 


714 


ftftHCPDUPH ft.PT RMVTXTft DPftTftM ftP TUP TDD 

UlNbtRVhU L-rLANKlNb RhblUN Uf Inc. LRR, 




DOMAIN 


715 


746 


CftMCPlJVPn H-PTAHIfTMf; DPGTftW ftp TUP TDD 


P 




747 


848 


T.rrTPTHP-DTPU DPDPATC MTU DPCTftNH 
LLULlflfj KILn Kfjrfjnlo (Qltl KIAjlUHJ, 


FT 


DOMAIN 


849 


910 


ftftMGPPVPTl P-PTRMVTMft BPftTftM ftP TUP TDD 




REPEAT 


105 


115 




FT 


REPEAT 


116 


139 


LRR 1-2* 


FT 


RFPPAT 


140 


163 


1 1' 




REPEAT 


164 


187 


TRP Ui 


FT 


REPEAT 


188 


211 


LRR 1-5 


PT 


RFPFAT 


212 


230 


TRP 1 f,' 


PT 


RFPPAT 


327 


337 


TPR %1 


FT 


RFPFAT 


338 


361 


LRR 2 ' 2 . 


PT 


DPDP1T 


362 


385 


LRR 2*3. 






lift 




tpb 


FT 


opppAT 










BPDP1T 


434 


452 


LRR 2"6. 


PT 


RPPPAT 


551 


562 


LRR 3-1. 




OPDPIVP 

KtFLAl 


563 


586 


LRR 3*2. 


PT 


KLrMU 


COT 

30/ 


610 


LRR 3*3. 


PT 


PFDPAT 


611 




tdd 


PT 


DFDP1T 


635 


7^7 




PT 


DPDPAT 






id ' ' 
LRR 4-1. 


PT 


KLFLA1 


ICO 

758 


781 


LRR 4-2. 


PT 


DPDF1T 

Ktrtnl 






top 


FT 


BPDPAT 


806 


829 




FT 


REPEAT 


830 


848 


LRR 4-5 


FT 


DOMAIN 


907 


944 


PfP-TTIfP 1 
Lbf LIMj 1. 






946 


983 


PPP-TTVP 1 




DOMAIN 






PftP-T XVV 1 /^J\T ^TrTW.DTMATXT^ / TiATPHTTTlT \ 

libr Llfib i, LAbLlUM'iJINDimj (POrtNTIAIi) 


■ 




1024 


1062 


PftP-T TVP A 




DOMAIN 


1064 


1100 


PrtP-TTVP ^ raT/'TrTM-DTHnTHft /DftTPMTTAT \ 


w 


DOMAIN 


1111 


1149 


P^P-TTVP C 


FT 


DOMAIN 


1353 


1392 


PRP.TJVp 7 
Lyf LJLAL / . 


FT 


DOMAIN 


1409 


1480 


CTCK. 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM) . 


FT 


CARBOHYD 


111 


111 


POTENTIAL. 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


FT 


CARBOHYD 


435 


435 


POTENTIAL. 


FT 


CARBOHYD 


783 


783 


POTENTIAL. 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


FT 


CARBOHYD 


998 


998 


POTENTIAL, 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL, 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL. 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL, 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL. 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISULFID 


916 


932 


BY SIMILARITY. 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DISULFID 


950 


961 


BY SIMILARITY. 


FT 


DISULFID 


955 


971 


BY SIMILARITY. 





DISULFID 


973 


982 


BY SIMILARITY, 


PT 


nTQftt urn 


989 


1001 


PV CTMTT ADTTV 


FT 


r)TC[TT,pTn 


995 


1010 


BY CTMTT ABTTV 


FT 




1012 


1021 


DV CTMTT 1DTTV 
Dl OlHlLnKllI . 


FT 


DISULFID 


1028 


1041 


BY SIMILARITY. 


FT 


DISULFID 


1035 


1050 


BY CTMTr.ARTTV 




DISULFID 


1052 


1061 


PV CTMTT APTTV 
Dl olfllLnttlll . 


FT 


DISULFID 


1068 


1079 


PY <?TMTr,ARTTY 


FT 


DTQnTPTn 
Uljubf LU 


1073 


1088 


RV CTMTT APTTV 


FT 


DISULFID 


1090 


1099 


RV CTMTT, ABTTV 




nTcnr.PTn 

LllOUuf LU 


1115 


1125 


BV CTVTT APTTV 


FT 




1120 


1137 


TJV CTMTT &DTTV 


FT 


DISULFID 


1139 


1148 


BY SIMILARITY. 


FT 


DISULFID 


1357 


1368 


BY SIMILARITY. 


FT 


DISULFID 


1362 


1380 


BY SIMILARITY. 


FT 


DISULFID 


1382 


1391 


BY SIMILARITY. 


FT 


DISULFID 


1409 


1443 


BY SIMILARITY. 


FT 


DISULFID 


1423 


1457 


BY SIMILARITY. 


FT 


DISULFID 


1434 


1473 


BY SIMILARITY. 


FT 


DISULFID 


1438 


1475 


BY SIMILARITY, 


FT' 


DISULFID 


1442 


1479 


BY SIMILARITY. 


SO 


SEQUENCE 


1480 AA; 165752 MW; 2CD1C421 C 



Query Match 40,3%; Score 489; DB 1; Length 1480; 

Best Local Similarity 40,5%; Pred, No. 3,30e-86; 

Matches 66; Conservative 42; Mismatches 48; Indels 7; Gaps 5; 

Db 1050 CDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMY 1109 

hi :|: I I:: |||::| 1 1 1 1 : hi :| I I |:: |:| I :: 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA-- 61 

Db 1110 PQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLR 1169 

I III: II :l II: :|:| lh I II I l::ll :::::: I: 
Qy 62 PK - S PCEGTECQNGA- • NCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDT YLQFT DLQ 118 

Db 1170 TRPEANVTI-VFSSAEQNGILMYDGQDAHLAVELFNGRIRVSY 1211 

: I:h lh 1 1 : 1 1 1 1 : 1 : 1 : 1 : 1 1 |: l::| II 
Qy 119 NWXRXNITLQVFT-AEDNGILLYNGGNDHIAVXLYXGHVRFSY 160 



RESULT 2 

ID CRBJROME STANDARD; PRT; 2139 AA. 

AC P10040; 

DT 01-MAR-1989 (REL. 10, CREATED) 

DT 01-MAY-1991 (REL. 18, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CRUMBS PROTEIN PRECURSOR (95F). 

GN CRB. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EURARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA, 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN-OREGON-R; TISSUE-EMBRYO; 

RX MEDLINE; 90263104. 

RA TEPASS U., THERES C, KNUST E.; 

RT "Crumbs encodes an EGF-like protein expressed on apical membranes of 

RT Drosophila epithelial cells and required for organization of 

RT epithelia."; 

RL CELL 61:787*799(1990) . 

RN [2] 

RP SEQUENCE OF 1663-1955 FROM N.A. 

RX MEDLINE; 87218537. 

RA KNUST E., DIETRICH U., TEPASS U., BREMER K.A., WEIGEL D., 

RA VAESSIN H., CAMPOS -ORTEGA J.A.; 

RT "EGF homologous sequences encoded in the genome of Drosophila 

RT melanogaster, and their relation to neurogenic genes."; 

RL EMBO J, 6:761-766(1987). 

CC -!- FUNCTION: MAY PLAY A ROLE IN THE DEVELOPMENT OF EPITHELIA, 
CC POSSIBLY FOR THE ESTABLISHMENT AND/OR MAINTENANCE OF CELL 
CC POLARITY. IT MAY ACT AS A SIGNAL. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
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cc 


-!■ PTM: 


PHOSPHORYLATED IN THE CYTOPLASMIC DOMAIN (POTENTIAL). 


FT 


DISULFID 


436 


451 


BY SIMILARITY. 


cc 
cc 


-!• SIMILARITY: CONTAINS 29 


EGF 'LIKE DOMAINS, 




nrcm t?m 
UIjUJjUD 


453 
468 


462 


BY SIMILARITY, 










FT 


DISULFID 


479 


BY SIMILARITY, 


cc 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


FT 


DISULFID 


473 


488 


BY SIMILARITY. 


cc 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation • 


FT 


DISULFID 


490 


499 


BY SIMILARITY, 


cc 


the Euro 


Dean Bioinformatics Institute. There are no restrictions on its 


FT 


DISULFID 


505 


515 


BY SIMILARITY, 


cc 


use by 


non-profit institutions as long as its content is in no way 


FT 


DISULFID 


509 


520 


BY SIMILARITY. 


cc 


modified and this statement is not removed. Osage by and for commercial 


FT 


DISULFID 


522 


531 


BY SIMILARITY. 


cc 


entities requires a license 


agreement (See http://www.isb-sib.ch/announce/ 


FT 


DISULFID 


549 


562 


BY SIMILARITY. 


cc 
cc 


or send 


in email to licenseGisb-sib.ch). 


FT 


DISULFID 


556 


569 


BY SIMILARITY. 
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DISULFID 


571 


580 


BY SIMILARITY. 


DR 


EMBL; M33753; G552087; ALT S 


EQ. 
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DISULFID 
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BY SIMILARITY. 


DR 


EMBL; X05144; E1746; ■. 




FT 


DISULFID 


591 


602 


BY SIMILARITY, 


DR 


EMBL; X05144; G929536; -. 




FT 


DISULFID 
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610 


BY SIMILARITY. 


DR 


PIR; B26637; B26637. 




FT 


DISULFID 
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624 


BY SIMILARITY , 




PIR; A35672; A35672. 




FT 


' DISULFID 
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BY SIMILARITY. 


r\B 

H 


FLYBASE; 


FBgnOOOO 


368; crb. 




FT 


DISULFID 
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645 


BY SIMILARITY. 


■ 


PROSITE; PS00010; ASXJYDROXYL; 15, 


FT 


DISULFID 


652 


664 


BY SIMILARITY. 




PROSITE; PSQQ022; EGFJ.; 26. 




FT 


DISULFID 
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673 


BY SIMILARITY. 




PROSITE; PS01186; 


EGF J; 17. 




FT 


DISULFID 


675 


684 


BY SIMILARITY. 




PROSITE; PS01187; EGF.CA; 15. 


FT 


DISULFID 


691 


702 


BY SIMILARITY. 




PFAM; PF 


30008; EGF; 26, 




FT 


DISULFID 


696 


711 


BY SIMILARITY. 


DR 


PFAM; PF00054; laminin_G; 3. 




FT 


DISULFID 


713 


722 


BY SIMILARITY, 


DR 


HSSP; P00740; 1IXA. 




FT 


DISULFID 


729 


740 


BY SIMILARITY, 
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DIFFERENTIATION; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 


FT 


DISULFID 


734 


749 


BY SIMILARITY, 


KW 


GLYCOPROTEIN; SIGNAL; PHOSPHORYLATION. 


FT 


DISULFID 


751 


760 


BY SIMILARITY, 


FT 


SIGNAL 
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91 


90 




FT 


DISULFID 


767 


778 


BY SIMILARITY, 
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CHAIN 


2139 


CRUMBS PROTEIN. 
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DISULFID 


772 


787 


BY SIMILARITY. 
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DOMAIN 


91 


2084 


EXTRACELLULAR (POTENTIAL) . 


FT 


DISULFID 


789 


799 


BY SIMILARITY. 


FT 


TRANSMEM 


2085 


2111 


POTENTIAL. 


FT 


DISULFID 


806 


817 


BY SIMILARITY. 


FT 


DOMAIN 


2112 


2139 


CYTOPLASMIC (POTENTIAL). 


FT 


DISULFID 


811 


826 


BY SIMILARITY. 
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DOMAIN 


267 


303 


EGF-LIKE 1. 
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DISULFID 


828 


837 


BY SIMILARITY, 


FT 


DOMAIN 


306 


343 


EGF-LIKE 2. 


FT 


DISULFID 


844 


855 


BY SIMILARITY. 


FT 


DOMAIN 


348 


386 


EGF-LIKE 3, 


FT 


DISULFID 


849 


890 


BY SIMILARITY. 
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DOMAIN 


388 


425 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 
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901 


BY SIMILARITY. 
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DOMAIN 


427 


463 


EGF-LIKE 5. 
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DISULFID 


908 


919 


BY SIMILARITY, 




DOMAIN 


464 


500 


EGF-LIKE 6. 


FT 


DISULFID 


913 


928 


BY SIMILARITY. 
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DOMAIN 
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532 


EGF-LIKE 7, 
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939 


BY SIMILARITY. 




DOMAIN 


545 


581 


EGF-LIKE 8. 
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DISULFID 


946 


957 


BY SIMILARITY, 
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DOMAIN 


582 


611 


EGF-LIKE 9, 


FT 


DISULFID 


952 


966 


BY SIMILARITY. 
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DOMAIN 


609 


646 


EGF-LIKE 10. 


FT 


DISULFID 


968 


977 


BY SIMILARITY, 
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DOMAIN 


648 
687 


685 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 


984 


995 


BY SIMILARITY. 


FT 


DOMAIN 


723 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


989 


1009 


BY SIMILARITY, 


FT 


DOMAIN 


725 


761 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


1011 


1020 


BY SIMILARITY. 


FT 


DOMAIN 


763 


800 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) , 


FT 


DISULFID 


1211 


1222 


BY SIMILARITY. 


1 


DOMAIN 


802 


838 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


1216 


1231 


BY SIMILARITY, 


w 


DOMAIN 


840 


902 


EGF-LIKE 16. 


FT 


DISULFID 


1233 


1242 


BY SIMILARITY. 


FT 
FT 


DOMAIN 


904 


940 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 


1485 


1496 


BY SIMILARITY, 


DOMAIN 


942 


978 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 


1490 


1505 


BY SIMILARITY, 
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DOMAIN 


980 


1021 


EGF-LIKE 19. 
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DISULFID 


1507 


1516 


BY SIMILARITY. 


!F 


DOMAIN 


1207 


1243 


EGF-LIKE 20, 


FT 


DISULFID 


1763 


1774 


BY SIMILARITY, 


FT 


DOMAIN 


1481 


1517 


EGF-LIKE 21, 


FT 


DISULFID 


1768 


1783 


BY SIMILARITY. 
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DOMAIN 


1759 


1795 


EGF-LIKE 22. 


FT 


DISULFID 


1785 


1794 


BY SIMILARITY. 


FT 


DOMAIN 


1797 


1833 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


1801 


1812 


BY SIMILARITY. 


FT 


DOMAIN 


1835 


1871 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


1806 


1821 


BY SIMILARITY. 


FT 


DOMAIN 


1874 


1915 


EGF-LIKE 25. 


FT 


DISULFID 


1823 


1832 


BY SIMILARITY. 


FT 


DOMAIN 


1915 


1951 


EGF-LIKE 26. 
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1839 


1850 


BY SIMILARITY. 
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DOMAIN 


1953 


1989 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


1844 


1859 


BY SIMILARITY. 
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DOMAIN 


1991 


2029 


EGF-LIKE 28, CALCIUM-BINDING (POTENTIAL). 
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1870 


BY SIMILARITY, 
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DOMAIN 


2030 


2070 


EGF-LIKE 29. 


FT 
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BY SIMILARITY, 
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BY SIMILARITY. 
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BY SIMILARITY, 
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BY SIMILARITY, 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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1968 


BY SIMILARITY. 


FT 


DISULFID 
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BY SIMILARITY, 
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DISULFID 
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1977 


BY SIMILARITY, 


FT 


DISULFID 
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BY SIMILARITY. 
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DISULFID 


1979 
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BY SIMILARITY. 


FT 


DISULFID 
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385 


BY SIMILARITY. 


FT 
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BY SIMILARITY. 


FT 
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403 


BY SIMILARITY. 
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2002 


2017 


BY SIMILARITY. 
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412 


BY SIMILARITY. 
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2019 


2028 


BY SIMILARITY. 


FT 
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424 


BY SIMILARITY, 


FT 


CARBOHYD 


37 
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POTENTIAL. 
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BY SIMILARITY. 
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POTENTIAL. 
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198 


198 


POTENTIAL. 


RN 


238 


238 


POTENTIAL. 


RP 


239 


239 


POTENTIAL. 


RX 


336 


336 


POTENTIAL. 


RA 


400 


400 


POTENTIAL. 


RT 


550 


550 


POTENTIAL. 


RT 


565 


565 


POTENTIAL. 


RT 


736 


736 


POTENTIAL. 


RL 


746 


746 


POTENTIAL. 


CC 


860 


860 


POTENTIAL. 


CC 


884 


884 


POTENTIAL. 


CC 


976 


976 


POTENTIAL. 


CC 


1102 


1102 


POTENTIAL. 


CC 


1114 


1114 


POTENTIAL. 


CC 


1138 


1138 


POTENTIAL, 


CC 


1192 


1192 


POTENTIAL. 


CC 


1245 


1245 


POTENTIAL. 


CC 


1255 


1255 


POTENTIAL. 


CC 


1354 


1354 


POTENTIAL. 


CC 


1363 


1363 


POTENTIAL. 


CC 


1441 


1441 


POTENTIAL. 


CC 


1454 


1454 


POTENTIAL. 
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CC 


of annotations omitted. 


CC 








CC 




28.3%; 


Score 343; DB 1; Length 2139; 


CC 



FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

» CARBOHYD 
CARBOHYD 

Note: remainder 



Query Match 

Best Local Similarity 42.7%; Pred. No. 1 



!e-52; 



Matches 41; Conservative 20; Mismatches 33; Indels 2; Gaps 2; 

Db 1821 CQCQPGFEGQHCEQNIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTCEN 1880 

hi II: l::| :| hi h hlh I I : II I hi I I h: : 
Oy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 1881 EPCRNGSTCQNGFN-ASTGNNFTCTCVPGFEGPLCD 1915 

II :h MM I II I hill II I: 
Oy 64 S PC • EGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 3 

ID FBP1 STRPU STANDARD; PRT; 1064 AA. 

AC P10079; 

DT 01-MAR-1989 (REL, 10, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN I PRECURSOR (EPIDERMAL GROWTH FACTOR- RELATED PROTEIN 1) 

DE (UEGF-1). 

GN EGF1. 

OS STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN) . 

tEUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 
EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONG YLOCENT ROT IDAE ; 
STRONGYLOCENTROTUS. 
[1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 90112459. 

RA DELGADILLO-REYNOSO M.G., ROLLO D.R., HURSH D.A., RAFF R.A.; 

RT "Structural analysis of the uEGF gene in the sea urchin 

RT strongylocentrotus purpuratus reveals more similarity to vertebrate 

RT than to invertebrate genes with EGF- like repeats."; 

RL J. MOL. EVOL. 29:314-327(1989). 

RN [2] 

RP SEQUENCE OF 279-476 AND 781-1064 FROM N.A, 

RX MEDLINE; 87319677, 

RA HURSH D.A, , ANDREWS MX, RAFF R.A.;- 

RT "A sea urchin gene encodes a polypeptide homologous to epidermal 

RT growth factor."; 

RL SCIENCE 237:1487-1490(1987), 

RN [3] 

RP AVIDIN-LIKE DOMAIN. 

RX MEDLINE; 89196806, 

RA HUNT L.T., BARKER W.C.; 

RT "Avidin-like domain in an epidermal growth factor homolog from a sea 

RT urchin."; 

RL FASEB J. 3:1760-1764(1989). 



[4) 

CHARACTERIZATION. 
MEDLINE; 91285254, 

BISGROVE B.W., ANDREWS M.E., RAFF R.A.; 

"Fibropellins, products of an EGF repeat-containing gene, form a 

unique extracellular matrix structure that surrounds the sea urchin 

embryo."; 

DEV. BIOL. 146:89-99(1991), 

-I- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
MATRIX, 

-!- SUBCELLULAR LOCATION: EXTRACELLULAR. IN VESICLES IN THE CYTOPLASM 
OF UNFERTILIZED EGGS, THEN TO THE BASE OF THE HYALIN LAYER 
THROUGHOUT DEVELOPMENT AND FINALLY IN THE APICAL LAMINA IN LATE 
EMBRYOS AND EARLY LARVAE. 

-!- DEVELOPMENTAL STAGE: MODERATE LEVELS IN UNFERTILIZED EGGS AND 
DURING EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN 
LATE MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS 
MAINTAINED THROUGH SUBSEQUENT STAGES. EXPRESSED BOTH MATERNALLY 
AND ZYGOTICALLY. 

-I- ALTERNATIVE PRODUCTS: TWO FORMS (IA AND IB) ARE PRODUCED BY 

ALTERNATIVE SPLICING. THE SMALL FORM (IB) LACKS 8 EGF REPEATS, 
-!- SIMILARITY: CONTAINS 21 EGF-LIKE DOMAINS. 
-!- SIMILARITY; CONTAINS 1 CUB DOMAIN, 

-I- SIMILARITY: THE C-TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
TO AVIDIN/STREPTAVIDIN. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the embl outstation - 
the European Bioinformatics Institute, There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http : //www . isb-sib . ch/announce/ 
or send an email to license@isb-sib.ch). 

EMBL; L08692; G161467; -. 

EMBL; L08692; G161466; -. 

EMBL; X17530; G667061; -. 

EMBL; M17421; G552260; -. 

EMBL; X17533; G667062; 

PIR; A29316; A29316. 

PROSITE; PS00010; ASXJYDROXYL; 19. 

PROSITE; PS00022; EGF.l; 19. 

PROSITE; PS00577; AVIDIN; 1. 

PROSITE; PS01180; CUB; 1, 

PROSITE; PS01186; EGF_2; 19. 

PROSITE; PS01187; EGF_CA; 19. 

PFAM; PF00008; EGF; 21. 

PFAM; PF00431; CUB; 1. 

HSSP; P01132; 1EPH. 

BIOTIN; ALTERNATIVE SPLICING; EGF-LIKE DOMAIN; REPEAT; SIGNAL; 
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GLYCOPROTEIN. 
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SIGNAL 


1 
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POTENTIAL. 
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FIBROPELLIN I, 
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20 


55 


EGF-LIKE 1. 
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175 


CUB, 
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212 


EGF-LIKE 2, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL), 


FT 
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EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 
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DOMAIN 


442 
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EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL) . 
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DOMAIN 
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EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL), 
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EGF-LIKE 13, CALCIUM- BINDING (POTENTIAL). 
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EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 
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EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL), 
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DOMAIN 
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EGF-LIKE 16, CALCIUM- BINDING (POTENTIAL). 
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DOMAIN 
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EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 
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DOMAIN 


784 


820 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 19, CALCIUM- BINDING (POTENTIAL)'. 
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896 


EGF-LIKE 20. 
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934 


EGF-LIKE 21, CALCIUM- BINDING 


FT 


DOMAIN 


936 


1064 


AVIDIN-LIKE. 
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BY SIMILARITY. 
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28 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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DISULFID 


299 


314 


BY SIMILARITY. 
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DISULFID 


316 


325 


BY SIMILARITY. 
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DISULFID 


332 


343 


BY SIMILARITY. 
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DISULFID 
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352 


BY SIMILARITY. 


FT 


DISULFID 
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363 


BY SIMILARITY. 
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DISULFID 


370 


381 


BY SIMILARITY. 
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DISULFID 


375 


390 


BY SIMILARITY. 
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DISULFID 


392 


401 


BY SIMILARITY. 
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DISULFID 


408 


419 


BY SIMILARITY. 
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413 


428 


BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 


FT 


DISULFID 


451 
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BY SIMILARITY. 
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477 


BY SIMILARITY. 
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484 


495 


BY SIMILARITY. 
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489 


504 


BY SIMILARITY. 
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DISULFID 
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515 


BY SIMILARITY. 


FT 


DISULFID 


522 


533 


BY SIMILARITY. 


FT 


DISULFID 
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542 


BY SIMILARITY. 


FT 
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544 


553 


BY SIMILARITY, 
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DISULFID 


560 


571 


BY SIMILARITY, 
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DISULFID 


565 


580 


BY SIMILARITY, 


FT 


DISULFID 


582 


591 


BY SIMILARITY. 


FT 


DISULFID 


598 


609 


BY SIMILARITY, 


FT 


DISULFID 


603 


618 


BY SIMILARITY, 


FT 


DISULFID 


620 


629 


BY SIMILARITY. 


FT 


DISULFID 


636 


647 


BY SIMILARITY, 


FT 


DISULFID 


641 


656 


BY SIMILARITY, 


FT 


DISULFID 


658 


667 


BY SIMILARITY. 


i 


DISULFID 


674 


685 


BY SIMILARITY. 


■ 


DISULFID 


679 


694 


BY SIMILARITY. 


m 


DISULFID 


696 


705 


BY SIMILARITY, 


FT 


DISULFID 


712 


723 


BY SIMILARITY. 


FT 


DISULFID 


717 


732 


BY SIMILARITY. 


FT 


DISULFID 


734 


743 


BY SIMILARITY. 


FT 


DISULFID 


750 


761 


BY SIMILARITY, 


FT 


DISULFID 


755 


770 


BY SIMILARITY. 


FT 


DISULFID 


772 


781 


BY SIMILARITY. 
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DISULFID 


788 


799 


BY SIMILARITY, 


FT 


DISULFID 


793 


808 


BY SIMILARITY. 


FT 


DISULFID 


810 


819 


BY SIMILARITY, 


FT 


DISULFID 


826 


837 


BY SIMILARITY, 


FT 


DISULFID 


831 
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BY SIMILARITY, 


FT 


DISULFID 
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857 


BY SIMILARITY. 
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DISULFID 
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BY SIMILARITY, 
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DISULFID 
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BY SIMILARITY , 


FT 


DISULFID 
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895 


BY SIMILARITY. 
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902 


913 


BY SIMILARITY, 


FT 


DISULFID 
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922 


BY SIMILARITY. 
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DISULFID 
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933 


BY SIMILARITY. 
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VARSPLIC 
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MISSING (IN FORM IB). 
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CARBOHYD 
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30 


POTENTIAL. 


FT 


CARBOHYD 


136 


136 


POTENTIAL. 


FT 


CARBOHYD 


851 


851 


POTENTIAL, 


FT 


CONFLICT 


279 


279 


L •> S (IN REF. 2). 



SQ SEQUENCE 1064 AA; 112072 MW; FBD10D48 CRC32; 



Query Match 25.41; Score 308; DB 1; Length 1064; 

Best Local Similarity 46.3*; Pred. No. 1.40e-44; 

Matches 44; Conservative 17; Mismatches 28; Indels 6; Gaps 2; 

Db 808 CACVPGFTGSNCETNIDECASDPCLNGGICVDGVNGFVCQCPPNYSGTYCEIS--LDA-- 863 

I || I |:| I ||: |:| ||:: I |: I I : | | 

Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 864 --CRSMPCQNGATCVNVGADYVCECVPGYAGQNCE 896 

I : lllll II: I :|| 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 4 

ID DLL1JUMAN STANDARD; PRT; 723 AA. 

AC 000548; 

DT 15-JUL-1998 (REL. 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTAl). 

GN DLL1, 

OS HOMO SAPIENS (HUMAN). 

OC ■ EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA MANN R.S., GRAY G.E., HENRIQUE D. ( ISH-HOROWIC2 D., 

RA ARTAVANIS-TSAKONAS S.; 

RL SUBMITTED (MAY-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC •!■ FUNCTION; MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 

CC MAMMALIAN EMBRYOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 

CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 

CC SIMILARITY), 

CC •!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -I- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 

CC -I- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch), 

CC 

DR EMBL; AF003522; G2197069; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 3, 

DR PROSITE; PS00022; EGF.l; 8. 

DR PROSITE; PS01186; EGF_2; 8. 

DR PROSITE; PS01187; EGF.CA; 1. 

DR PFAM; PF00008; EGF; 6. 

DR HSSP; P00740; 1IXA, 

KW SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE, 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


723 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


546 


568 


POTENTIAL. 


FT 


DOMAIN 


569 


723 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


226 


254 


EGF-LIKE 1, 


FT 


DOMAIN 


257 


285 


EGF-LIKE 2, 


FT 


DOMAIN 


292 


325 


EGF-LIKE 3. 


FT 


DOMAIN 


332 


363 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL) 


FT 


DOMAIN 


370 


402 


EGF-LIKE 5. 


FT 


DOMAIN 


409 


440 


EGF-LIKE 6. 


FT 


DOMAIN 


447 


478 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


485 


516 


EGF-LIKE 8, 


FT 


DISULFID 


226 


237 


BY SIMILARITY. 


FT 


DISULFID 


230 


243 


BY SIMILARITY, 


FT 


DISULFID 


245 


254 


BY SIMILARITY. 


FT 


DISULFID 


257 


268 


BY SIMILARITY. 


FT 


DISULFID 


263 


274 


BY SIMILARITY. 


FT 


DISULFID 


276 


285 


BY SIMILARITY, 


FT' 


DISULFID 


292 


304 


BY SIMILARITY, 
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FT 


DISOLFID 


298 


314 


BY SIMILARITY. 


FT 


DISULFID 


316 


325 


BY SIMILARITY, 


FT 


DISULFID 


332 


343 


BY SIMILARITY, 


FT 


DISOLFID 


337 


352 


BY SIMILARITY. 


FT 


DISULFID 


354 


363 


BY SIMILARITY. 


FT 


DISULFID 


370 


381 


BY SIMILARITY. 


FT 


DISULFID 


375 


391 


RY QTWrriPTTV 


FT 


DISULFID 


393 


402 


RY 5TMTI.1RTTV 
Dl oiPlibnnlll . 


FT 


DISULFID 


409 


420 


BY SIMILARITY. 


FT 


DISULFID 


414 


429 


BY SIMILARITY. 


FT 


DISULFID 


431 


440 


BY SIMILARITY, 


FT 


DISULFID 


447 


467 


BY SIMILARITY. 


FT 


DISULFID 


469 


478 


BY SIMILARITY. 


FT 


DISULFID 


485 


496 


BY SIMILARITY. 


FT 


DISULFID 


490 


505 


BY SIMILARITY. 


FT 


DISULFID 


507 


516 


BY SIMILARITY. 


FT 


CARBOHYD 


477 


477 


POTENTIAL. 


SQ 


SEQUENCE 


723 AA; 


77956 MW; A1D48BDB CRC32 



Query Match 25.1%; Score 304; DB 1; Length 723; 

« est Local Similarity 42.4%; Pred. No. 1.09e-43; 
atches 42; Conservative 24; Mismatches 27; Indels 6; Gaps 5; 

Db 429 CRCQAGFSGRHCDDNVDDC ASS PCANGGTCRDGVNDFSCTCPPG YTGRNCS - AP - V - S - R 484 

I I :l = 1 Ml I II: I I II :ll h I :| : : : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 485 --CEHAPCHNGATCHERGHGYVCECARGYGGPNCQFLLP 521 

II : hill I ::!: 11:1 |:|||:|: lh 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESULT 5 

ID DLL1JOUSE STANDARD; PRT; 722 AA. 

AC Q61483; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL, 35, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL, 35, LAST ANNOTATION UPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTA1), 

GN DLL1. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BALB/C X C57BL/6; TISSUE-EMBRYO; 

RX MEDLINE; 95401858. 

RA BETTENHAUSEN B., DE ANGELIS M.H., SIMON D., GUENET J.-L, GOSSLER A.; 

RT "Transient and restricted expression during mouse embryogenesis of 
Dill, a murine gene closely related to Drosophila Delta."; 

■ DEVELOPMENT 121:2407-2418(1995). 

^ ■!• FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 
CC MAMMALIAN EMBRYOS , MAY HAVE A ROLE IN CELLULAR INTERACTIONS 
CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM. 

CC -!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 

CC -I- TISSUE SPECIFICITY: IN THE EMBRYO, EXPRESSED IN THE PARAXIAL 
CC MESODERM AND NERVOUS SYSTEM. EXPRESSED AT HIGH LEVELS IN ADULT 
CC HEART AND AT LOWER LEVELS, IN ADULT LUNG. 

CC -!- DEVELOPMENTAL STAGE: EXPRESSED UNTIL DAY 15 IN THE EMBRYO. 
CC EXPRESSION THEN • DECREASES AND INCREASES AGAIN IN THE ADULT. 

CC -!- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: TO DROSOPHILA DELTA PROTEIN, 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://vww.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch), 

CC 

DR EMBL; X80903; G806570; -. 

DR MGD; MGI: 104659; DLLl, 



DR PROSITE; PS00010; ASXJYDROXYL; 3, 

DR PROSITE; PS00022; EGF_1 ; 8, 

DR PROSITE; PS01186; EGF_2 ; 8, 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PFAM; PF00008; EGF; 6. 

DR HSSP; P00740; 1IXA. 

KW SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE , 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


722 


DELTA-LIKE PROTEIN 1, 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


546 


568 


POTENTIAL, 


FT 


DOMAIN 


569 


722 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


225 


253 


EGF-LIKE 1. 


FT 


DOMAIN 


256 


284 


EGF-LIKE 2. 


FT 


DOMAIN 


291 


324 


EGF-LIKE 3 . 


FT 


DOMAIN 


331 


362 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL) 


FT 


DOMAIN 


369 


401 


EGF-LIKE 5. 


FT 


DOMAIN 


408 


439 


EGF-LIKE 6. 


FT 


DOMAIN 


446 


477 


EGF-LIKE 7, CALCIUM- BINDING (POTENTIAL) 


FT 


DOMAIN 


484 


515 


EGF-LIKE 8. 


FT 


DISULFID 


225 


236 


BY SIMILARITY. 


FT 


DISULFID 


229 


242 


BY SIMILARITY, 


FT 


DISULFID 


244 


253 


BY SIMILARITY, 


FT 


DISULFID 


256 


267 


BY SIMILARITY, 


FT 


DISULFID 


262 


273 


BY SIMILARITY. 


FT 


DISULFID 


275 


284 


BY SIMILARITY. 


FT 


DISULFID 


291 


303 


BY SIMILARITY, 


FT 


DISULFID 


297 


313 


BY SIMILARITY. 


FT 


nTQnr.PTn. 

\Jxo\lus lu 


315 


324 


BY SIMILARITY, 


FT 


DISULFID 


331 


342 


BY SIMILARITY. 


FT 


DISULFID 


336 


351 


BY SIMILARITY. 


FT 


DISULFID 


353 


362 


BY SIMILARITY. 


FT 


DISULFID 


369 


380 


BY SIMILARITY. 


FT 


DISULFID 


374 


390 


BY SIMILARITY. 


FT 


DISULFID 


392 


401 


BY SIMILARITY. 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


FT 


DISULFID 


430 


439 


BY SIMILARITY, 


FT 


DISULFID 


446 


466 


BY SIMILARITY. 


FT 


DISULFID 


468 


477 


BY SIMILARITY. 


FT 


DISULFID 


484 


495 


BY SIMILARITY. 


FT 


DISULFID 


489 


504 


BY SIMILARITY. 


FT 


DISULFID 


506 


515 


BY SIMILARITY, 


FT 


CARBOHYD 


476 


476 


POTENTIAL. 


SQ 


SEQUENCE 


722 AA; 


78448 MW; 5A647702 CRC32; 



Query Match 24,94; Score 302; DB 1; Length 722; 

Best Local Similarity 42,4%; Pred, No. 3,04e-43; 

Matches 42; Conservative 22; Mismatches 29; Indels 6; Gaps 

Db 428 CRCQAGFSGRYCEDNVDDCASSPCANGGTCRDSVNDFSCTCPPGYTGKNCS-AP-V-S-R 483 

I I :h:| I :l III MM II :|| |: ||:| I :| : : : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 484 --CEHAPCHNGATCHQRGQRYMCECAQGYGGPNCQFLLP 520 

II : 1:111 I ::l I :|:| hllhh l|: 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESULT 6 

ID DLL1.RAT STANDARD; PRT; 714 AA. 

AC P97677; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTA1) , 

GN DLLl. 

OS RATTUS NORVEGICUS (RAT) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA .DISIBIO G,, HEBSHI L., BOULTER J., WEINMASTER G.; 
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RL 


SUBMITTED (DEC-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 




II : hill 1 ::| 1 :hl hll::|: l|: 


CC 


-!- FUNCTION: MAY 


BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 


Qy 


64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 


CC 


MAMMALIAN EMBR 


YOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 






CC 


UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 






CC 


SIMILARITY) . 






RESULT 7 


CC 


■!■ SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 


ID 


NOTC BRARE STANDARD; PRT; 2437 AA, 


CC 


•!• SIMILARITY: CONTAINS 8 


GF-LIKE DOMAINS, 


AC 


P46530; 


CC 


-!■ SIMILARITY: TO DROSOPHILA DELTA PROTEIN, 


DT 


01-NOV-1995 (REL. 32, CREATED) 


CC 










DT 


01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 










CC 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


DT 


15- JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 


CC 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation • 


DE 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN PRECURSOR. 


CC 


the Eurof 


ean Bioinformatics Institute. There are no restrictions on its 


GN 


NOTCH. 


CC 


use by 


non-profit institutions as long as its content is in no way 


OS 


BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO) . 


CC 


modified and this statement is not removed. Usage by and for commercial 


OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 


CC 


entities requires a license agreement (See http://ww.isb-sib.ch/announce/ 


OC 


TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 


CC 
rr 


or send an email to license@isb-sib.ch), 


OC 


CYPRINIDAE; RASBORINAE; DANIO, 
[1] 

SEQUENCE FROM N.A. 


■ 


EMBL; U7f 


889; G1699046; -. 




RN 
RP 




PROSITE; 


PS00010; ASXJYDROXYL; 3. 


RC 


TISSUE-EMBRYO; 


W 

UK 


PROSITE; 


PS00022; 


EGF J; 8. 




RX 


MEDLINE; 94128602, 


DR 


PROSITE; 


PS01186; EGF J; 8. 




RA 


BIERKAMP C, CAMPOS -ORTEGA J. A.; 


DR 


PROSITE; PS01187; 


EGF_CA; 2. 


RT 


"A zebrafish homologue of the Drosophila neurogenic gene Notch and 
its pattern of transcription during early embryogenesis."; 


DR 


PFAM; PF00008; EGF; 6, 




RT 


DR 


HSSP; P00740; 1IXA, 




RL 


MECH. DEV. 43:87-100(1993). 


KW 


SIGNAL; f 


GF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE. 


CC 


-!- FUNCTION: IMPLICATED IN CELL FATE SPECIFICATIONS DURING 


FT 


SIGNAL 


1 


17 


POTENTIAL. 


CC 


EMBRYO DEVELOPMENT. MAY BE INVOLVED IN THE FORMATION OF THE 


FT 


CHAIN 


18 


714 


DELTA-LIKE PROTEIN 1. 


CC 


NEURAL PLATE, NOTOCHORD AND BRAIN VESICLES. 


FT 


DOMAIN 


18 


537 


EXTRACELLULAR (POTENTIAL). 


CC 


•!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 


FT 


TRANSMEM 


538 


560 


POTENTIAL, 


CC 


-!- DEVELOPMENTAL STAGE: EXPRESSED IN ALL CELLS IN PREGASTRULATION 


FT 


DOMAIN 


561 


714 


CYTOPLASMIC (POTENTIAL), 


CC 


STAGES. DURING GASTRULATION IS DIFFERENTIALLY EXPRESSED, 


FT 


DOMAIN 


225 


253 


EGF-LIKE 1. 


CC 


ACCUMULATING PREDOMINANTLY IN THE PRECHORDAL MESODERM AND 


FT 


DOMAIN 


256 


284 


EGF-LIKE 2. 


CC 


NOTOCHORD. AT THE END OF GASTRULATION, EXPRESSED ALONG THE 


FT 


DOMAIN 


291 


324 


EGF-LIKE 3. 


CC 


ANTERIOR-POSTERIOR AXIS INCLUDING THE DEVELOPING NEURAL PLATE 


FT 


DOMAIN 


331 


362 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL). 


CC 


AND DIFFERENTIATING MESODERM. ALSO PRESENT IN THE DEVELOPING 


FT 


DOMAIN 


369 


401 


EGF-LIKE 5. 


CC 


BRAIN AND HEAD REGIONS, 


FT 


DOMAIN 


408 


439 


EGF-LIKE 6. 


CC 


-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 


FT 


DOMAIN 


446 


477 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


CC 


-1- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 


FT 


DOMAIN 


484 


515 


EGF-LIKE 8, 


CC 


•!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 


FT 


DISULFID 


225 


236 


BY SIMILARITY. 


CC 


■!- SIMILARITY: CONTAINS 6 AM REPEATS. 


FT 


DISULFID 


229 


242 


BY SIMILARITY. 


CC 




FT 


DISULFID 


244 


253 


BY SIMILARITY. 


CC 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


FT 


DISULFID 


256 


267 


BY SIMILARITY. 


CC 


between the Swiss Institute of Bioinformatics and the EMBL outstation - 


FT 


DISULFID 


262 


273 


BY SIMILARITY. 


CC 


the European Bioinformatics Institute. There are no restrictions on its 


FT 


DISULFID 


275 


284 


BY SIMILARITY, 


CC 


use by non-profit institutions as long as its content is in no way 


FT 


DISULFID 


291 


303 


BY SIMILARITY. 


CC. 


modified and this statement is not removed, Usage by and for commercial 


m 


DISULFID 


297 


313 


BY SIMILARITY. 


CC 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


W 


DISULFID 


315 


324 


BY SIMILARITY. 


CC 


or send an email to license?isb-sib.ch). 


w 


DISULFID 


331 


342 


BY SIMILARITY, 


CC 


FT 


DISULFID 


336 


351 


BY SIMILARITY. 


DR 


EMBL; X69088; G433867; -, 


FT 


DISULFID 


353 


362 


BY SIMILARITY. 


DR 


PROSITE; PS00010; ASXJYDROXYL; 23, ' 


FT 


DISULFID 


369 


380 


BY SIMILARITY. 


DR 


PROSITE; PS00022; EGF_1; 34. 


FT 


DISULFID 


374 


390 


BY SIMILARITY. 


DR 


PROSITE; PS01186; EGFJ; 28, 


FT 


DISULFID 


392 


401 


BY SIMILARITY. 


DR 


PROSITE; PS01187; EGfIcA; 22. 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 


DR 


PFAM; PF00008; EGF; 36. 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


DR 


PFAM; PF00023; ank; 6. 


FT 


DISULFID 


430 


439 


BY SIMILARITY, 


DR 


PFAM; PF00066; notch; 3. 


FT 


DISULFID 


446 


466 


BY SIMILARITY. 


DR 


HSSP; P00740; 1IXA. 


FT 


DISULFID 


468 


477 


BY SIMILARITY. 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


FT 


DISULFID 


484 


495 


BY SIMILARITY, 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 


FT 


DISULFID 


489 


504 


BY SIMILARITY, 


FT 


SIGNAL 1 20 POTENTIAL. 


FT 


DISULFID 


506 


515 


BY SIMILARITY. 


FT 


CHAIN 21 2437 NEUROGENIC LOCOS NOTCH HOMOLOG PROTEIN. 


FT 


CARBOHYD 


476 


476 


POTENTIAL. 


FT 


DOMAIN 21 1724 EXTRACELLULAR (POTENTIAL). 


so 


SEQUENCE 


714 AA; 


77378 MW; 604B76D1 CRC32; 


FT 


TRANSMEM 1725 1747 POTENTIAL. 












FT 


DOMAIN 1748 2437 CYTOPLASMIC (POTENTIAL). 


Query Match 




24,3%; 


Score 295; DB 1; Length 714; 


FT 


DOMAIN 21 57 EGF-LIKE 1. 


Best Local Similarity 


41.4%; 


Pred. No. 1.09e-41; 


FT 


DOMAIN 58 98 EGF-LIKE 2. 


Matches 41; Conservative 


23; Mismatches 29; Indels 6; Gaps 5; 


FT 


DOMAIN 101 138 EGF-LIKE 3. 












FT 


DOMAIN 139 175 EGF-LIKE 4. 


Db 


4 2 8 CRCQTGFSGRYCEDNVDDCASSPCANGGTCRDSVNDFSCTCPPGYTGRNCS • AP ■ V* S-R 483 


FT 


DOMAIN 177 215 EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 




1 1 


I::| 1 


: IN 


111:11 li ill I: l|:|: 1 :| : : : 


FT' 


DOMAIN 217 254 EGF-LIKE 6. 


Qy 


4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 


FT 


DOMAIN 256 292 EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 












FT 


DOMAIN 294 332 EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


Db 


.484. --CEHAPCHNGATCHQRGQRYMCECAQGYGGANCQFLLP 520 


FT 


DOMAIN 334 370 EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 
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FT 


DOMAIN 


371 


409 


EGF-LIKE 10. 


FT 


DISULFID 


460 


475 


BY SIMILARITY. 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


477 


486 


BY SIMILARITY, 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


493 


503 


BY SIMILARITY. 


FT 


DOMAIN 


489 


524 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


498 


512 


BY SIMILARITY, 


FT 


DOMAIN 


526 


562 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). ■ 


FT 


DISULFID 


514 


523 


BY SIMILARITY. 


FT 


DOMAIN 


564 


599 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


. FT 


DISULFID 


530 


541 


BY SIMILARITY. 


FT 


DOMAIN 


601 


637 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


535 


550 


BY SIMILARITY, 


FT 


DOMAIN 


639 


674 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


552 


561 


BY SIMILARITY, 


FT 


DOMAIN 


676 


712 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


568 


578 


BY SIMILARITY, 


FT 


DOMAIN 


714 


749 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


573 


587 


BY SIMILARITY. 


FT 


DOMAIN 


751 


787 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


589 


598 


BY SIMILARITY. 


FT 


DOMAIN 


789 


825 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


605 


616 


BY SIMILARITY. 


FT 


DOMAIN 


827 


865 


EGF-LIKE 22. 


FT 


DISULFID 


610 


625 


BY SIMILARITY. 


FT 


DOMAIN 


867 


903 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


627 


636 


BY SIMILARITY. 


FT 


DOMAIN 


905 


941 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


643 


653 


BY SIMILARITY, 


FT 


DOMAIN 


943 


979 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


648 


662 


BY SIMILARITY. 


FT 


DOMAIN 


981 


1017 


EGF-LIKE 26, 


FT 


DISULFID 


664 


673 


BY SIMILARITY, 


FT 


DOMAIN 


1019 


1055 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


680 


691 


BY SIMILARITY. 


FT 


DOMAIN 


1057 


1093 


EGF-LIKE 28. 


FT 


DISULFID 


685 


700 


BY SIMILARITY. 


FT 


DOMAIN 


1095 


1141 


EGF-LIKE 29, 


FT 


DISULFID 


702 


711 


BY SIMILARITY, 


FT 


DOMAIN 


1143 


1179 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


718 


728 


BY SIMILARITY. 


m 


DOMAIN 


1181 


1217 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


723 


737 


BY SIMILARITY. 


■ 


DOMAIN 


1219 


1263 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


739 


748 


BY SIMILARITY. 


FT 


DOMAIN 


1265 


1303 


EGF-LIKE 33. 


FT 


DISULFID 


755 


766 


BY SIMILARITY. 


FT 


DOMAIN 


1305 


1344 


EGF-LIKE 34. 


FT 


DISULFID 


760 


775 


BY SIMILARITY. 


FT 


DOMAIN 


1346 


1382 


EGF-LIKE 35. 


FT 


DISULFID 


777 


786 


BY SIMILARITY, 


FT 


DOMAIN 


1385 


1423 


EGF-LIKE 36. 


FT 


DISULFID 


793 


804 


BY SIMILARITY, 


FT 


DOMAIN 


1446 


1561 


3 X LIN/NOTCH REPEATS. 


FT 


DISULFID 


798 


813 


BY SIMILARITY. 


FT 


REPEAT 


1446 


1486 


LIN/NOTCH 1. 


FT 


DISULFID 


815 


824 


BY SIMILARITY. 


FT 


REPEAT 


1487 


1520 


LIN/NOTCH 2. 


FT 


DISULFID 


831 


842 


BY SIMILARITY. 


FT 


REPEAT 


1521 


1561 


LIN/NOTCH 3. 


FT 


DISULFID 


836 


853 


BY SIMILARITY. 


FT 


DOMAIN 


1861 


2074 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


855 


864 


BY SIMILARITY. 


FT 


REPEAT 


1861 


1891 


ANK MOTIF 1, 


FT 


DISULFID 


871 


882 


BY SIMILARITY. 


FT 


REPEAT 


1892 


1940 


ANK MOTIF 1. 


FT 


DISULFID 


876 


891 


BY SIMILARITY. 


FT 


REPEAT 


1941 


1974 


ANK MOTIF 1, 


FT 


DISULFID 


893 


902 


BY SIMILARITY. 


FT 


REPEAT 


1975 


2007 


ANK MOTIF 1, 


FT 


DISULFID 


909 


920 


BY SIMILARITY, 


FT 


REPEAT 


2008 


2040 


ANK MOTIF 1. 


FT 


DISULFID 


914 


929 


BY SIMILARITY. 


FT 


REPEAT 


2041 


2074 


ANK MOTIF 1. 


FT 


DISULFID 


931 


940 


BY SIMILARITY. 


FT 


DOMAIN 


2265 


2276 


POLY-GLN (OPA-REPEAT), 


FT 


DISULFID 


947 


958 


BY SIMILARITY. 


FT 


DISULFID 


25 


35 


BY SIMILARITY. 


FT 


DISULFID 


952 


967 


BY SIMILARITY. 


FT 


DISULFID 


29 


45 


BY SIMILARITY. 


FT 


DISULFID 


969 


978 


BY SIMILARITY. 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 


FT 


DISULFID 


1023 


1034 


BY SIMILARITY, 


FT 


DISULFID 


62 


73 


BY SIMILARITY, 


FT 


DISULFID 


1028 


1043 


BY SIMILARITY, 


FT 


DISULFID 


67 


86 


BY SIMILARITY, 


FT 


DISULFID 


1045 


1054 


BY SIMILARITY, 


FT 


DISULFID 


88 


97 


BY SIMILARITY. 


FT 


DISULFID 


1061 


1072 


BY SIMILARITY. 


FT 


DISULFID 


105 


116 


BY SIMILARITY. 


.FT 


DISULFID 


1066 


1081 


BY SIMILARITY. 


FT 


DISULFID 


110 


126 


BY SIMILARITY. 


FT 


DISULFID 


1083 


1092 


BY SIMILARITY, 


FT 


DISULFID 


128 


137 


BY SIMILARITY. 


FT 


DISULFID 


1099 


1120 


BY SIMILARITY. 


FT 


DISULFID 


143 


154 


BY SIMILARITY. 


FT 


DISULFID 


1114 


1129 


BY SIMILARITY. 




DISULFID 


148 


163 


BY SIMILARITY. 


FT 


DISULFID 


1131 


1140 


BY SIMILARITY. 


■ 


DISULFID 


165 


174 


BY SIMILARITY, 


FT 


DISULFID 


1147 


1158 


BY SIMILARITY. 


w 


DISULFID 


181 


194 


BY SIMILARITY. 


FT 


DISULFID 


1152 


1167 


BY SIMILARITY, 


FT 


DISULFID 


188 


203 


BY SIMILARITY. 


FT 


DISULFID 


1169 


1178 


BY SIMILARITY. 


FT 


DISULFID 


205 


214 


BY SIMILARITY. 


FT 


DISULFID 


1185 


1196 


BY SIMILARITY, 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 


FT 


DISULFID 


1190 


1205 


BY SIMILARITY, 


FT 


DISULFID 


226 


242 


BY SIMILARITY. 


FT 


DISULFID 


1207 


1216 ■ 


BY SIMILARITY. 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT 


DISULFID 


1223 


1242 


BY SIMILARITY. 


FT 


DISULFID 


260 


271 


BY SIMILARITY. 


FT 


DISULFID 


1236 


1251 


BY SIMILARITY. 


FT 


DISULFID 


265 


280 


BY SIMILARITY. 


FT 


DISULFID 


1253 


1262 


BY SIMILARITY. 


FT 


DISULFID 


282 


291 


BY SIMILARITY. 












FT 


DISULFID 


298 


311 


BY SIMILARITY. 


Note: remainder of annotations omitted. 


FT 


DISULFID 


305 


320 


BY SIMILARITY. 












FT 


DISULFID 


322 


331 


BY SIMILARITY, 


Query Match 




24.04, 


Score 291; DB 1; Length 2437; 


FT 


DISULFID 


338 


349 


BY SIMILARITY. 


Best Local Similarity 44 .8% r 


Pred. No. 8.36e-41; 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 


Matches 43 


Conservative 


15; Mismatches 31; Indels 


FT 


DISULFID 


360 


369 


BY SIMILARITY, 












FT 


DISULFID 


375 


386 


BY SIMILARITY. 


Db 


474 HCICMPGYEGVFCQINSDDCASQPCLNG-KCIDKINSFHCECPKGFSGSLCQV- 


FT 


DISULFID 


380 


397 


BY SIMILARITY, 




:| M 


II 1 


1 1 III 


: 1 II 1:1 :ll: 1 1: Ml II:: 


FT 


DISULFID 


399 


408 


BY SIMILARITY, 


Qy 


3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIP 


FT 


DISULFID 


415 


428 


BY SIMILARITY, 












FT 


DISULFID 


422 


437 


BY SIMILARITY. 


Db 


529 -E-CASTPCKNGAKCTDGPNKYTCECTPGFSGIHCE 562 


FT 


DISULFID 


439 


448 


BY SIMILARITY. 




1 


1 1 111:1 1 1 


1:1 111:1 II 


FT 


DISULFID ■ 


455 


466 


BY SIMILARITY. 


Qy 


63 KSPCEGTECQNGANCYDQGNRPVCQCLPGFGGPECE 98 
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US-09-191- 



.-647-5. rsp 
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RESULT 8 

ID FBP3.STRPU STANDARD; PRT; 570 AA. 
AC P49013; 

DT 01-FEB-1996 (BEL. 33, CREATED) 

DT OHEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-HOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN C PRECURSOR (EPIDERMAL GROWTH FACTOR -RELATED PROTEIN 3) 

DE (EGF III) (FIBROPELLIN III). 

GN EGF3, 

OS STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN). 
OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 
OC EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONG YLOCENTROTIDAE; 
OC STRONGYLOCENTROTUS. 
RN [1] 

SEQUENCE FROM N.A. 
TISSUE-GASTRULA; 
MEDLINE; 93273088. 
BISGROVE B.W., RAFF R.A.; 

"The spEGF III gene encodes a member of the fibropellins: EGF repeat- 
containing proteins that form the apical lamina of the sea urchin 
embryo."; 

DEV. BIOL. 157:526-538(1993). 

-I- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
■MATRIX. 

-!- SUBCELLULAR LOCATION: EXTRACELLULAR, 

•!- DEVELOPMENTAL STAGE: LOW LEVELS IN UNFERTILIZED EGGS AND DURING 
EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN LATE 
MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS MAINTAINED 
THROUGH SUBSEQUENT STAGES. 
EXPRESSED BOTH MATERNALLY AND ZYGOTICALLY. 
SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 
SIMILARITY: CONTAINS 1 CUB DOMAIN. 

SIMILARITY: THE C- TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
TO AVIDIN/STREPTAVIDIN . 



f 



i 



-!- 



-!- 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinforraatics and the EMBL outstation • 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http://ww.isb-sib,ch/announce/ 
or send an email to license@isb-sib.ch), 

EMBL; L07045; G310660; -. 
PROSITE; PS00010; ASXJYDROXYL; 8, 
PROSITE; PS00022; EGF 1; 8. 
PROSITE; PS00577; AVIDIN; 1. 
PROSITE; PS01180; CUB; 1. 
PROSITE; PS01186; EGF 2; 7. 
PROSITE; PS01187; EGF CA; 6. 
PFAM; PF00008; EGF; 8. 
PFAM; PF00431; CUB; 1, 
HSSP; P00740; 1IXA. 

BIOTIN; EGF-LIKE DOMAIN; REPEAT; SIGNAL; GLYCOPROTEIN, 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


570 


FIBROPELLIN C. 


FT 


DOMAIN 


18 


55 


EGF-LIKE 1. 


FT 


DOMAIN 


62 


175 


CUB, 


FT 


DOMAIN 


176 


212 


EGF-LIKE 2, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7. 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


442 


570 


AVIDIN-LIKE. 


FT 


DISULFID 


23 


34 


BY SIMILARITY. 


FT 


DISULFID 


28 


43 


BY SIMILARITY. 


FT 


DISULFID 


45 


54 


BY SIMILARITY. 


FT 


DISULFID 


180 


191 


BY SIMILARITY. 


FT" 


DISULFID 


185 


200 


BY SIMILARITY. 



'FT 


DISULFID 


202 


211 


BY 


SIMILARITY. 


FT 


DISULFID 


218 


229 


BY 


SIMILARITY. 


FT 


DISULFID 


223 


238 


BY 


SIMILARITY. 


FT 


DISULFID 


240 


249 


BY 


SIMILARITY. 


FT 


DISULFID 


256 


267 


BY 


SIMILARITY. 


FT 


DISULFID 


261 


276 


BY 


SIMILARITY. 


FT 


DISULFID 


278 


287 


BY 


SIMILARITY. 


FT 


DISULFID 


294 


305 


BY 


SIMILARITY. 


FT 


DISULFID 


299 


314 


BY 


SIMILARITY. 


FT 


DISULFID 


316 


325 


BY 


SIMILARITY. 


FT 


DISULFID 


332 


343 


BY 


SIMILARITY. 


FT 


DISULFID 


337 


352 


BY 


SIMILARITY. 


FT 


DISULFID 


354 


363 


BY 


SIMILARITY. 


FT 


DISULFID 


370 


381 


BY 


SIMILARITY. 


FT 


DISULFID 


375 


390 


BY 


SIMILARITY. 


FT 


DISULFID 


392 


401 


BY 


SIMILARITY. 


FT 


DISULFID 


408 


419 


BY 


SIMILARITY. 


FT 


DISULFID 


413 


428 


BY 


SIMILARITY. 


FT 


DISULFID 


430 


439 


BY 


SIMILARITY. 


FT 


CARBOHYD 


30 


30 


POTENTIAL. 


FT 


CARBOHYD 


136 


136 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


SQ 


SEQUENCE 


570 AA; 


61116 MW; 265BC4BB CRC32 



Query Match 23.7%; Score 287; DB 1; Length 570; 

Best Local Similarity 42.1%; Pred. No. 6.39e-40; 

Matches 40; Conservative 18; Mismatches 31; Indels 6; Gaps 2; 

Db 314 CDCRAGFTGSNCETNINECASSPCLNGGSCLDGVDGYVCQCLPNYTGTHCEIS- -LDA- - 369 

hi II I ::l I II: hi h:| I I hi III: I I 

Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 370 --CASLPCQNGGVCTNVGGDYVCECLPGYTGINCE 402 

I : lllh I : I MM!: | :|| 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 
ID 



9 



STANDARD; 



PRT; 2318 AA, 



NTC3JOUSE 

AC Q61982; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL, 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH 3 PROTEIN. 

GN NOTCH3. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENT IA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-ICR X SWISS WEBSTER; 

RX MEDLINE; 95001556. 

RA LARDELLI M. , DALSTRAND J., LENDAHL O.; 

RT "The novel Notch homologue mouse Notch 3 lacks specific epidermal 

RT growth factor -repeats and is expressed in proliferating 

RT neuroepithelium."; 

RL MECH. DEV, 46:123-136(1994). 

CC -I- FUNCTION: NOTCH 1, 2 AND 3 PLAY A COMBINATIONAL ROLE DURING 
CC VARIOUS CELL FATE DECISIONS AND MORPHOLOGICAL MOVEMENTS IN THE 
CC DEVELOPING CNS AND PROBABLY OTHER REGIONS OF THE EMBRYO, 

CC ■!■ TISSUE SPECIFICITY: PROLIFERATING NEUROEPITHELIUM, 

CC ■!• DEVELOPMENTAL STAGE: CNS DEVELOPMENT . 

CC ■!- SIMILARITY: CONTAINS 34 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY; CONTAINS 3 LIN/NOTCH REPEATS. 

CC ■!- SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the .European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch). 
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US-09-19M 
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cc 










FT 


DISULFID 


169 


184 


BY SIMILARITY. 










DR 


EMBL; X74760; G483581; *. 




FT 


DISULFID 


186 


195 


BY SIMILARITY. 


DR 


MGD; MGI : 99460; N0TCH3 . 




FT 


DISULFID 


202 


213 


BY SIMILARITY, 


DR 


PROSITE; PS00010; ASXJYDROXYL; 18. 


FT 


DISULFID 


207 


223 


BY SIMILARITY , 


DR 


PROSITE; PS00022; 


EGF J; 33. 




FT 


DISULFID 


225 


234 


BY SIMILARITY, 


DR 


PROSITE; PS01186; EGF.2; 27. 




FT 


DISULFID 


241 


252 


BY SIMILARITY. 


DR 


PROSITE; PS01187; 


EGF_CA; 17. 


FT 


DISULFID 


246 


261 


BY SIMILARITY. 


DR 


PFAM; PF00008; EGF; 33. 




FT 


DISULFID 


263 


272 


BY SIMILARITY. 


DR 


PFAM; PF00023; ank; 6. 




FT 


DISULFID 


279 


292 


BY SIMILARITY. 


DR 


PFAM; PF00066; notch; 3. 




FT 


DISULFID 


286 


301 


BY SIMILARITY. 


DR 


HSSP; P00740; 1IXA. 




FT 


DISULFID 


303 


312 


BY SIMILARITY . 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 


FT 


DISULFID 


319 


330 


BY SIMILARITY . 


KW 


GLYCOPROTEIN. 






FT 


DISULFID 


324 


339 


BY SIMILARITY. 


FT 


DOMAIN 


1 


1643 




FT 


DISULFID 


341 


350 


BY SIMILARITY, 


FT 


TRANSMEM 


1644 


1664 


POTENTIAL, 


FT 


DISULFID 


356 


367 


BY SIMILARITY, 


FT 


DOMAIN 


1665 


2318 


rvTnpi.aQMTf 


FT 


DISULFID 


361 


378 


BY SIMILARITY. 


FT 


DOMAIN 


39 


1374 


1A Y PflP-TVDP BWDFATC 


FT 


DISULFID 


380 


389 


BY SIMILARITY. 


FT 


DOMAIN 


1388 


1503 


J A LlN/NOHn Kbrtfllo. 


FT 


DISULFID 


396 


409 


BY SIMILARITY. 


FT 


DOMAIN 


1784 


1998 


u A k.lA,lU/5nl n KfirfiftlS, 


FT 


DISULFID 


403 


418 


BY SIMILARITY. 


FT 


DOMAIN 


2242 


2261 




FT 


DISULFID 


420 


429 


BY SIMILARITY. 


w 


DOMAIN 




78 


wc-t TIN? 1 


FT 


DISULFID 


436 


447 


BY SIMILARITY . 


m 




79 


119 


EGF-LIKE 2. 


FT 


DISULFID 


441 


456 


BY SIMILARITY. 


w 


UUnnlW 






Lbt LIRti J. 


FT 


DISULFID 


458 


467 


BY SIMILARITY, 


FT 


DOMAIN 


159 


10K 

III 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


474 


485 


BY SIMILARITY. 


FT 


DOMAIN 


198 


235 


EGF-LIKE 5. 


FT 


DISULFID 


479 


494 


BY SIMILARITY. 


FT 


DOMAIN 






EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


496 


505 


BY SIMILARITY, 


FT 




275 


m 


EGF-LIKE 7. 


FT 


DISULFID 


512 


523 


BY SIMILARITY, 


FT 


UUMA1N 


315 


«i 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT- 


DISULFID 


517 


532 


BY SIMILARITY. 


FT 


DOMAIN 




390 


EGF-LIKE 9. 


FT 


DISULFID 


534 


543 


BY SIMILARITY. 


FT 


UUMAiN 


392 


430 


EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


550 


560 


BY SIMILARITY. 


FT 


DOMAIN 


432 


468 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


555 


569 


BY SIMILARITY, 


FT 


DOMAIN 




506 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


571 


580 


BY SIMILARITY, 


FT 


UUMA1N 




544 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


587 


598 


BY SIMILARITY , 


FT 


UUMA1N 




581 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


592 


607 


BY SIMILARITY. 


FT 


UUMfUN 




619 


EGF-LIKE 15, CALCIUM'BINDING (POTENTIAL). 


FT 


DISULFID 


609 


618 


BY SIMILARITY, 


FT 


DOMAIN 


621 


656 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


625 


635 


BY SIMILARITY, 


FT 


DOMAIN 


658 


694 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


630 


644 


BY SIMILARITY. 


FT 


DOMAIN 


696 


731 


EGF-LIKE 18. 


FT 


DISULFID 


646 


655 


BY SIMILARITY. 


FT 


DOMAIN 




Inn 


Lbr-LlKb ly. 


FT 


DISULFID 


662 


673 


BY SIMILARITY. 


FT 


DOMAIN 




809 


EGF-LIKE 20. 


FT 


DISULFID 


667 


682 


BY SIMILARITY. 


FT 


DOMAIN 


811 




EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


684 


693 


BY SIMILARITY. 


FT 


DOMAIN 


850 


886 


EGF-LIKE 22, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


700 


710 


BY SIMILARITY. 


FT 


UUIWUW 






EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


705 


719 


BY SIMILARITY, 


FT 


DOMAIN 


925 


961 


tbr LIKh ii , 


FT 


DISULFID 


721 


730 


BY SIMILARITY. 


FT 


DOMAIN 




999 


Lbc LIMj ij , 


FT 


DISULFID 


739 


750 


BY SIMILARITY. 


FT 




1001 




EGF-LIKE 26, 


FT 


DISULFID 


744 


759 


BY SIMILARITY, 


FT 




1037 


1083 




FT 


DISULFID 


761 


770 


BY SIMILARITY, 


FT 


DOMAIN 






EGF-LIKE 28, 


FT 


DISULFID 


776 


787 


BY SIMILARITY. 


FT 


DOMAIN 


n-n 


lien 

1159 


EGF-LIKE 29, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


781 


797 


BY SIMILARITY. 




DOMAIN 


1161 




EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


799 


808 


BY SIMILARITY. 


■ 


DOMAIN 


1206 


1245 


hbr LIKl ii, 


FT 


DISULFID 


815 


827 


BY SIMILARITY. 


w 


DOMAIN 


1247 


1288 


EGF-LIKE 32, 


FT 


DISULFID 


821 


836 


BY SIMILARITY. 


FT 


DOMAIN 


1290 


1326 


EGF-LIKE 33. 


FT 


DISULFID 


838 


847 


BY SIMILARITY. 


FT 


DOMAIN 


1336 


1374 


EGF-LIKE 34. 


FT 


DISULFID 


854 


865 


BY SIMILARITY. 


FT 


REPEAT 


1388 


1428 


LIN/NOTCH 1. 


FT 


DISULFID 


859 


874 


BY SIMILARITY. 


FT 


REPEAT 


1429 


1467 


LIN/NOTCH 2. 


FT 


DISULFID 


876 


885 


BY SIMILARITY, 


FT 


REPEAT 


1468 


1503 


LIN/NOTCH 3. 


FT 


DISULFID 


892 


902 


BY SIMILARITY. 


FT 


REPEAT 


1784 


1816 


CDC10/SWI6 1, 


FT 


DISULFID 


897 


911 


BY SIMILARITY. 


FT 


REPEAT 


1817 


1865 


CDC10/SWI6 2, 


FT 


DISULFID 


913 


922 


BY SIMILARITY. 


FT 


REPEAT 


1866 


1898 


CDC10/SWI6 3, 


FT 


DISULFID 


929 


940 


BY SIMILARITY. 


FT 


REPEAT 


1899 


1932 


CDC10/SWI6 4. 


FT 


DISULFID 


934 


949 


BY SIMILARITY. 


FT 


REPEAT 


1933 


1965 


CDC10/SWI6 5. 


FT 


DISULFID 


951 


960 


BY SIMILARITY. 


FT 


REPEAT 


1966 


1998 


CDC10/SWI6 6. 


FT 


DISULFID 


967 


978 


BY SIMILARITY. 


FT 


DISDLFID 


43 


55 


BY SIMILARITY, 


FT 


DISULFID 


972 


987 


BY SIMILARITY. 


FT 


DISULFID 


49 


66 


BY SIMILARITY. 


FT 


DISULFID 


989 


998 


BY SIMILARITY. 


FT 


DISULFID 


68 


77 


BY SIMILARITY. 


FT 


DISULFID 


1005 


1016 


BY SIMILARITY. 


FT 


DISULFID 


83 


94 


BY SIMILARITY. 


FT 


DISULFID 


1010 


1023 


BY SIMILARITY. 


FT 


DISULFID 


88 


107 


BY SIMILARITY. 


FT 


DISULFID 


1025 


1034 


BY SIMILARITY. 


FT 


DISULFID 


109 


118 


BY SIMILARITY, 


FT 


DISULFID 


1041 


1062 


BY SIMILARITY. 


FT 


DISULFID 


124 


135 


BY SIMILARITY. 


FT 


DISULFID 


1056 


1071 


BY SIMILARITY. 


FT 


DISULFID 


129 


145 


BY SIMILARITY. 


FT 


DISULFID 


1073 


1082 


BY SIMILARITY. 


FT 


DISULFID 


147 


156 


BY SIMILARITY. 


FT 


DISULFID 


1089 


1100 


BY SIMILARITY. 


FT 


DISULFID 


163 


175 


BY SIMILARITY. 


FT 


DISULFID 


1094 


1109 


BY SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

« DISULFID 
DISULFID 
DISULFID 



1111 1120 

1127 1138 

1132 1147 

1149 1158 

1165 1183 

1177 1192 

1194 1203 

1210 1223 

1215 1233 

1235 1244 

1251 1262 

1256 1276 

1278 1287 

1294 1305 

1299 1314 

1316 1325 

1340 1351 

1345 1362 



BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY . 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 



Note: remainder of annotations omitted. 

Query Match 23.0%; Score 279; DB 1; Length 2318; 

Best Local Similarity 41.71; Pred. No. 3.68e-38; 

Matches 40; Conservative 20; Mismatches 30; Indels 6; Gaps < 

Db 456 CICMAGFTGTYCEVDIDECQSSPCVNGGVCKDRVNGFSCTCPSGFSGSMC QLDV-- 509 

I 1:1:: I : hi: I II: hi l|::|| |: |:|| :| :| 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 510 DECASTPCRNGAKCVDQPDGYECRCAEGFEGTLCER 545 

I :l hllhlll! : 1:1 II I II: 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEK 99 



RESULT 10 

ID NTC1JOUSE STANDARD; PRT; 2531 AA. 

AC Q01705; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR (MOTCH PROTEIN) . 

GN NOTCH1 OR MOTCH. 

OS MUS MUSCULUS (MOUSE) , 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

RODENT IA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

w 

kF SEQUENCE FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93194170. 

RA FRANCO DEL AMO P., GENDRON-MAGUIRE M., SWIATEK P.J., JENKINS N,A,, 

RA COPELAND N.G., GRIDLEY T.; 

RT "Cloning, analysis, and chromosomal localization of Notch- 1, a mouse 

RT homolog of Drosophila Notch."; 

RL GENOMICS 15:259-264(1993). 

RN [2] 

RP SEQUENCE OF 1551-2170 FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93048835. 

RA FRANCO DEL AMO F., SMITH D.E., SWIATEK P.J., GENDRON-MAGUIRE M, , 

RA GREENSPAN R.J. , MCMAHON A.P, , GRIDLEY T , ; 

RT "Expression pattern of Motch, a mouse homolog of Drosophila Notch, 

RT suggests an important role in early postimplantation mouse 

RT development."; 

RL DEVELOPMENT 115:737-744(1992). 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!• DEVELOPMENTAL STAGE :■ EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC *!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC 

CC .This' SWISS -PROT entry is copyright, It is produced through a collaboration 



CC 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation • 


CC 


the Euro 


>ean Bioinformatics Institute. There are no restrictions on its 


CC 


use by 


non-profit institutions as long as its content is in no way 


CC 


modified and this statement 


is not removed. Usage by and for commercial 


CC 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


CC 
CC 
DR 
DR 


or send an email to license@isb-sib.ch). 


EMBL; mm; G288503; -. 
MGD; MGI:97363; NOTCHl. 




DR 


PROSITE; PS00010; ASXJYDROXYL; 22. 


DR 


PROSITE; PS00022; EGF_1; 34. 




DR 


PROSITE; 


PS01186; EGF_2; 27. 




DR 


PROSITE; 


PS01187; EGF_CA; 21. 


DR 


PFAM; PF 


0008; EGF; 35. 




DR 


PFAM; PF00023; ank; 6. 




DR 


PFAM; PF00066; notch; 3. 




DR 


HSSP; P00740; 1IXA. 




KW 


DIFFERENTIATION, 


NEUROGENESI 


S; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 


FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 


FT 


DOMAIN 


19 


1725, 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1726 


1746 


POTENTIAL. 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


24 


1425 


36 X EGF-TYPE REPEATS. 


FT 


DOMAIN 


1449 


1462 


CYS-RICH, 


FT 


DOMAIN 


1445 


1562 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1445' 


1480 


LIN/NOTCH 1. 


FT 


REPEAT 


1481 


1522 


LIN/NOTCH 2. 


FT 


REPEAT 


1523 


1562 


LIN/NOTCH 3. 


FT 


DOMAIN 


1865 


2075 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1865 


1910 


ANK MOTIF 1. 


FT 


REPEAT 


1912 


1942 


ANK MOTIF 2. 


FT 


REPEAT 


1944 


1975 


ANK MOTIF 3. 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4. 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


FT 


REPEAT 


2044 


2075 


ANK MOTIF 6. 


FT 


CARBOHYD 


888 


888 


POTENTIAL. 


FT 


CARBOHYD 


959 


959 


POTENTIAL. 


FT 


CARBOHYD 


1179 


1179 


POTENTIAL. 


FT 


CARBOHYD 


1241 


1241 


POTENTIAL, 


FT 


CARBOHYD 


1489 


1489 


POTENTIAL, 


FT 


CARBOHYD 


1587 


1587 


POTENTIAL, 


SQ 


SEQUENCE 


2531 


AA; 271312 MW; AD71189B CRC32; 



Query Match 22.5*; Score 273; DB 1; Length 2531; 

Best Local Similarity 43.3*; Pred. No. 7. 61e- 37; 

Matches 42; Conservative 12; Mismatches 41; Indels 2; Gaps 2; 

Db 1169 CKCVAGYHGSNCSEEINECLSQPCQNGGTCIDLTNSYKCSCPRGTQGVHCEINVDDCHPP 1228 

I l::M I lllh ::| : lllh |:| III | |: | | I! : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHL-PAP 62 

Db 1229 LDPASRSPKCFNNGTCVDQVGGYTCTCPPGFVGERCE 1265 

I : I I : Nil I I III I II 
Qy 63 KSP-CEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



ID 


NTC1JUMAN STANDARD; PRT; 2444 AA. 




AC 


P46531; 




DT 


01-NOV-1995 (REL, 32, CREATED) 




DT 


01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 




DT 


01-FEB-1996 (REL, 33, LAST ANNOTATION UPDATE) 




DE 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1 PRECURS 


OR (TRANSLOCATION 


DE 


ASSOCIATED NOTCH PROTEIN TAN-1) (FRAGMENT), 




GN 


NOTCHl OR TAN1 . 




OS 


HOMO SAPIENS (HUMAN). 




OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; . 


OC 


PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 




RN 


[1] 




RP 


SEQUENCE FROM N,A. 




RX 


MEDLINE; 91347367. 
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US-09-19M 



■647-5. rsp 



Page 



RA ELLISEN L.W., BIRD J,, WEST D.C., SORENG A.L., REYNOLDS T.C., 

RA SMITH S.D., SKLAR J.; 

RT "TAN-1, the human homolog of the Drosophila notch gene, is broken by 

RT chromosomal translocations in T lymphoblastic neoplasms."; 

RL CELL 66:649-661(1991). 

CC ■!■ FUNCTION: MAY BE IMPORTANT FOR NORMAL LYMPHOCYTE FUNCTION. IN 

CC ALTERED FORM, MAY CONTRIBUTE TO TRANSFORMATION OR PROGRESSION 

CC IN SOME T-CELL NEOPLASMS. 

CC -!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- TISSUE SPECIFICITY: IN FETAL TISSUES MOST ABUNDANT IN SPLEEN, 

CC BRAIN STEM AND LUNG. ALSO PRESENT IN MOST ADULT TISSUES WHERE IT 

CC IS FOUND MAINLY IN LYMPHOID TISSUES. 

CC ■!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -I- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!• SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

JL use by non-profit institutions as long as its content is in no way 

A modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; M73980; G338675; -. 

DR MM; 190198; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 20. 

DR PROSITE; PS00022; EGF_1; 34. 

DR PROSITE; PS01186; EGF 2; 26, 

DR PROSITE; PS01187; EGF.CA; 18. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00O66; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


>2444 


NEUROGENIC LOCOS NOTCH PROTEIN HOMOLOG 1. 


FT 


DOMAIN 


19 


1736 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


1737 


1757 


POTENTIAL. 


FT 


DOMAIN 


1758 


>2444 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3. 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 




DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 




DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


n 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20. 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


829 


868 


EGF-LIKE 22. 


FT 


DOMAIN 


870 


906 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


908 


944 


EGF-LIKE 24. 


FT 


DOMAIN 


946 


982 


, EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 26. 


FT 


DOMAIN 


1022 


1058 


EGF-LIKE 27. 


FT 


DOMAIN 


1060 


1096 


EGF-LIKE 28. 


FT 


DOMAIN 


1098 


1144 


EGF-LIKE 29. 


FT 


DOMAIN 


1146 


1182 


EGF-LIKE 30. 


FT 


DOMAIN 


1184 


1220 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 



FT 


DOMAIN 


1222 


1266 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1268 


1306 


EGF-LIKE 33, 


FTr 


DOMAIN 


1308 


1347 


EGF-LIKE 34, 


FT 


DOMAIN 


1349 


1385 


EGF-LIKE 35. 


FT 


DOMAIN 


1388 


1427 


EGF-LIKE 36. 


FT 


DOMAIN 


1446 


1563 


3 X LIN/NOTCH REPEATS, 


FT 


REPEAT 


1446 


1481 


LIN/NOTCH 1, 


FT 


REPEAT 


1482 


1523 


LIN/NOTCH 2, 


FT 


REPEAT 


1524 


1563 


LIN/NOTCH 3. 


FT 


DOMAIN 


■ 1876 


2087 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1876 


1921 


ANK MOTIF 1, 


FT 


REPEAT 


1923 


1954 


ANK MOTIF 2. 


FT 


REPEAT 


1956 


1987 


■ ANK MOTIF 3. 


FT 


REPEAT 


1990 


2021 


ANK MOTIF 4. 


FT 


REPEAT 


2023 


2054 


ANK MOTIF 5. 


FT 


REPEAT 


'2056 


2087 


ANK MOTIF 6. 


FT 


DOMAIN 


1576 


1579 


POLY-VAL, 


FT 


DOMAIN 


1662 


1665 


POLY-ARG. 


FT 


DOMAIN 


1729 


1732 


POLY-PRO. 


FT 


DOMAIN 


1741 


1744 


POLY-ALA. 


FT 


DOMAIN 


1902 


1905 


POLY-GLU. 


FT 


DOMAIN 


2260 


2263 


POLY-GLY. 


FT 


DOMAIN 


2404 


2407 


POLY-GLN, 


FT 


DOMAIN 


2411 


2418 


POLY -PRO. 


FT 


DISULFID 


24 


37 


BY SIMILARITY. 


FT 


DISULFID 


31 


46 


BY SIMILARITY. 


FT 


DISULFID 


48 


57 


BY SIMILARITY. 


FT 


DISULFID 


63 


74 


BY SIMILARITY, 


FT 


DISULFID 


68 


87 


BY SIMILARITY. 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


FT 


DISULFID 


111 


127 


BY SIMILARITY. 


FT 


DISULFID 


129 


138 


BY SIMILARITY. 


FT 


DISULFID 


144 


155 


BY SIMILARITY. 


FT 


DISULFID 


149 


164 


BY SIMILARITY. 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 


FT 


DISULFID 


182 


195 


BY SIMILARITY. 


FT 


DISULFID 


189 


204 


BY SIMILARITY. 


FT 


DISULFID 


206 


215 


BY SIMILARITY. 


FT 


DISULFID 


222 


233 


BY SIMILARITY. 


FT 


DISULFID 


227 


243 


BY SIMILARITY. 


FT 


DISULFID 


245 


254 


BY SIMILARITY. 


FT 


DISULFID 


261 


272 


BY SIMILARITY. 


FT 


DISULFID 


266 


281 


BY SIMILARITY. 


FT 


DISULFID 


283 


292 


BY SIMILARITY. 


FT 


DISULFID 


299 


312 


BY SIMILARITY. 


FT 


DISULFID 


306 


321 


BY SIMILARITY. 


FT 


DISULFID 


323 


332 


BY SIMILARITY. 


FT 


DISULFID 


339 


350 


BY SIMILARITY. 


FT 


DISULFID 


344 


359 


BY SIMILARITY, 


FT 


DISULFID 


361 


370 


BY SIMILARITY. 


FT 


DISULFID 


376 


387 


BY SIMILARITY. 


FT 


DISULFID 


381 


398 


BY SIMILARITY. 


FT 


DISULFID 


400 


409 


BY SIMILARITY. 


FT 


DISULFID 


416 


429 


BY SIMILARITY. 


FT 


DISULFID 


423 


438 


BY SIMILARITY. 


FT 


DISULFID 


440 


449 


BY SIMILARITY, 


FT 


DISULFID 


456 


467 


BY SIMILARITY, 


FT 


DISULFID 


461 


476 


BY SIMILARITY, 


FT 


DISULFID 


478 


487 


BY SIMILARITY, 


FT 


DISULFID 


494 


505 


BY SIMILARITY. 


FT 


DISULFID 


499 


514 


BY SIMILARITY. 


FT 


DISULFID 


516 


525 


BY SIMILARITY. 


FT 


DISULFID 


532 


543 


BY SIMILARITY. 


FT 


DISULFID 


537 


552 


BY SIMILARITY. 


FT 


DISULFID 


554 


563 


BY SIMILARITY. 


FT 


DISULFID 


570 


580 


BY SIMILARITY, 


FT 


DISULFID 


575 


589 


BY SIMILARITY. 


FT' 


DISULFID 


591 


600 


BY SIMILARITY. 


FT 


DISULFID 


607 


618 


BY SIMILARITY, 


FT 


DISULFID 


612 


627 


BY SIMILARITY, 


FT 


DISULFID 


629 


638 


BY SIMILARITY. 


•FT 


DISULFID 


645 


655 


BY SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

f DISULFID 
DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 



650 
666 
682 
687 
704 
720 
725 
741 
757 
762 
779 
795 



664 
675 
693 
702 
713 
730 
739 
750 
768 
777 
788 
806 
815 
826 



855 
867 
885 
894 
905 
923 
932 
943 



817 
833 
838 
857 
874 
879 
896 
912 
917 
934 

993 1008 

1010 1019 

1026 1037 

1031 1046 

1048 1057 

1064 1075 

1069 1084 

1086 1095 

1102 1123 

1117 1132 

1134 1143 

1150 1161 

1155 1170 

1172 1181 

1188 1199 



BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 



Note: remainder of annotations omitted. 



t 



Query Match 22.4*; Score 272; DB 1; Length 2444; 

Best Local Similarity 44.3%; Pred. No. 1.26e-36; 
k ' itches 43; Conservative 11; Mismatches 41; mdels 2; 



2; 



1170 CKCVAGYHGVNCSEEIDECLSHPCQNGGTCLDLPNTYKCSCPRGTQGVHCEINVDDCNPP 1229 
I h:|l I INI: hi .1 Nil: |:| |:| I |: I I ||| ;| 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHL-PAP 62 

Db 1230 VDPVSRS PRC FNNGTCVDQVGGYSCTCPPGFVGERCE 1266 

I : I I : llll I I III I II 
Qy 63 KSPCE-GTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 12 

ID NTClJAT STANDARD; PRT; 2531 AA. 

AC Q07008; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR. 

GN NOTCH1. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI ; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-SCHWANN CELL; 

RX MEDLINE; 92111383. 

RA WEINMASTER G., ROBERTS V.J., LEMRE G.; 

RT "A homolog of Drosophila Notch expressed during mammalian 



RT 


development."; 






RL 


DEVELOPMENT 113:199-205(1991). 


CC 


•!• FUNCTION: RI 


QUIRED FOR THE CORRECT DIFFERENTIATION OF A NUMBER 


CC 


OF TISSUES. 






CC 


•!• SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 


CC 


•!• DEVELOPMENTAL STAGE: IN 


THE EMBRYO, HIGHEST LEVELS OCCUR BETWEEN 


CC 


DAYS 12 AND 14 AND DECRE 


ASE RAPIDLY TO MUCH LOWER LEVELS IN THE 


CC 


ADULT, 






CC 


-!- SIMILARITY; 


HIGH, WITH OTHER NOTCH -TYPE PROTEINS, 


CC 


-!- SIMILARITY: CONTAINS 36 


EGF-LIKE DOMAINS. 


CC 


-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 


CC 
CC 
CC 


SIMILARITY: CONTAINS 6 ANK REPEATS. 


This SWISS-PROT 


entry is copyright. It is produced through a collaboration 


CC 


between 


the Sw 


ss Institute of Bioinformatics and the EMBL outstation - 


CC 


the Euro 


3ean Bioinformatics Institute. There are -no restrictions on its 


CC 


use by 


non-profit institutions as long as its content is in no way 


CC 


modified 


and this statement is not removed, Usage by and for commercial 


CC 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


CC 
CC 
DR 


or send 


in email to license? 


isb-sib.ch). 


EMBL; X57405; G57635; -. 




DR' 


PROSITE; PS00010; ASXJYDROXYL; 22. 


DR 


PROSITE; 


PS00022; EGF_1 ; 35. 




DR 


PROSITE; PS01186; EGF 2; 26, 




DR 


PROSITE; PS01187; EGF_CA; 21. 


DR 


PFAM; PF 


30008; EGF; 35. 




DR 


PFAM; PF00023; ank; 6, 




DR 


PFAM; PF00066; notch; 3. 




DR 


HSSP; P00740; 1IXA. 




KW 


DIFFERENTIATION 


NEUROGENESI 


S; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


FT 


SIGNAL 


1 


18 


POTENTIAL, 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 


FT 


DOMAIN 


19 


1723 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


1724 


1746 


POTENTIAL. 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1, 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2, 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3, 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4 . 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


829 


867 


EGF-LIKE 22. 


FT 


DOMAIN 


869 


905 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


907 


943 


EGF-LIKE 24. 


FT 


DOMAIN 


945 


981 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


983 


1019 


EGF-LIKE 26. 


FT 


DOMAIN 


1021 


1057 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1059 


1095 


EGF-LIKE 28, 


FT 


DOMAIN 


1097 


1143 


EGF-LIKE 29. 


FT 


DOMAIN 


1145 


1181 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1183 


1219 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


1221 


1265 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1267 


1305 


EGF-LIKE 33, 


FT 


DOMAIN 


1307 


1346 


EGF-LIKE 34, 


FT 


DOMAIN 


1348 


1384 


EGF-LIKE 35, 


FT 


DOMAIN 


1387 


1426 


EGF-LIKE 36. 
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FT 


DOMAIN 


1449 


1462 


CYS-RICH, 


FT 


DOMAIN 


1865 


2076 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1865 


1910 


ANK MOTIF 1, 


FT 


REPEAT 


1912 


1942 


ANK MOTIF 2, 


FT 


REPEAT 


1944 


1975 


ANK MOTIF 3 , 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4 , 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5, 


FT 


REPEAT 


2044 


2076 


ANK MOTIF 6. 


FT 


DISULFID 


24 


37 


BY SIMILARITY. 


FT 


DISULFID 


31 


46 


BY SIMILARITY. 


FT 


DISULFID 


48 


57 


BY SIMILARITY. 


FT 


DISULFID 


63 


74 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY SIMILARITY. 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


FT 


DISULFID 


111 


127 


BY SIMILARITY. 


FT 


DISULFID 


129 


138 


BY SIMILARITY. 


FT 


DISULFID 


144 


155 


BY SIMILARITY, 


FT 


DISULFID 


149 


164 


BY SIMILARITY. 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 




DISULFID 


182 


195 


BY SIMILARITY. 


ft 


DISULFID 


189 


204 


BY SIMILARITY. 


w 


DISULFID 


206 


215 


BY SIMILARITY. 


FT 


DISULFID 


222 


233 


BY SIMILARITY. 


FT 


DISULFID 


227 


243 


BY SIMILARITY. 


FT 


DISULFID 


245 


254 


RV 5.TMTURTTY 


FT 


DISULFID 


261 


272 


BY SIMILARITY. 


FT 


DISULFID 


266 


281 


BY SIMILARITY. 


FT 


DISULFID 


283 


292 


BY SIMILARITY. 


FT 


DISULFID 


299 


312 


BY SIMILARITY. 


FT 


DISULFID 


306 


321 


BY SIMILARITY. 


FT 


DISULFID 


323 


332 


BY SIMILARITY. 


FT 


DISULFID 


339 


350 


BY SIMILARITY. 


FT 


DISULFID 


344 


359 


BV CTMTT 1DTTV 


FT 


DISULFID 


361 


370 


nv CTMTT 1RTTV 
Dl DlMlliniNll I . 


FT 


DISULFID 


376 


387 


RY CTMTT ftRTTY 


FT 




381 


398 


BV CTMTT SDTTV 


FT 


DISULFID 


400 


409 


BY SIMILARITY. 


FT 


DISULFID 


416 


429 


RV CTMTT ARTTY 

Dl OlPlLLinftll I , 


FT 


DISULFID 


423 


438 


RV QTMITARTTY 


FT 


DISULFID 


440 


449 


BY SIMILARITY. 


FT 


DISULFID 


456 


467 


BY SIMILARITY. 


FT 


DISULFID 


461 


476 


BY SIMILARITY. 


FT 


DISULFID 


478 


487 


BY SIMILARITY. 


FT 


DISULFID 


494 


505 


BY SIMILARITY. 


FT 


DISULFID 


499 


514 


BY SIMILARITY. 


FT 


DISULFID 


516 


525 


BY SIMILARITY, 


FT 


DISULFID 


532 


543 


BY SIMILARITY. 


El 


DISULFID 


537 


552 


BY SIMILARITY. 


■ 


DISULFID 


554 


563 


BY SIMILARITY. 




DISULFID 


570 


580 


BY SIMILARITY. 


FT 


DISULFID 


575 


589 


BY SIMILARITY. 


FT 


DISULFID 


591 


600 


RY SJMTIARTTY 


FT 


DISULFID 


507 


618 


RV CTMTT.ARTTY 
Dl oiPLlunlML X , 


FT 


DISULFID 


612 


627 


BV CTMTT SDTTV 


FT 


DISULFID 


629 


638 


RV CTMTT.ARTTY 
Dl DIPULnftll I . 


FT 


DISULFID 


645 


655 


RV CTMTT.ARTTY 


FT 


DISULFID 


650 


664 


RY CTMTT ARTTY 
Dl OlMlLnlxll 1 . 


FT 


UluUuS xU 


666 


675 


BV CTMTT SDTTV 
Dl olWlLrttUl I . 




DISULFID 


682 


693 


RV CTMTT aDTTV 
Dl Oinj-LrtKiil. 


FT 


DISULFID 


687 


702 


nv CTMTT ARTTV 


FT 


DISULFID 


704 


713 


RV CTMTT ADTTV 
Dl aiMlLAIUlI . 






720 


730 


BV CTMTTSDTTV 
Dl OlMiLnlUil . 


FT 


DISULFID 


725 


739 


BY SIMILARITY. 


FT 


DISULFID 


741 


750 


BY SIMILARITY. 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


FT 


DISULFID 


762 


777 


BY SIMILARITY. 


FT 


DISULFID 


779 


788 


BY SIMILARITY. 


FT 


DISULFID 


795 


806 


BY SIMILARITY. 


FT 


DISULFID 


800 


815 


BY SIMILARITY. 


FT 


DISULFID 


817 


826 


BY SIMILARITY. 


FT 


DISULFID 


833 


844 


BY SIMILARITY. 


FT 


DISULFID 


838 


855 


BY SIMILARITY. 
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nTOTTT i?Tr\ 
UloULrIL) 


857 


866 


BY SIMILARITY. 


FT 

Jm 


L/ioUbrlLJ 




884 


BY SIMILARITY. 


It 


HTCrTT PTn 


878 


893 


BY SIMILARITY. 




DISULFID 


895 


904 


BY SIMILARITY. 


FT 




911 




DV CTUTT RDTTV 




nTCITf PTT1 


916 


931 


BY SIMILARITY. 




nTCrTTT?Tn 


933 




DV CTUTT RDTTV 


FT 


nTCTTr.FTn 


987 


998 


nv CTMTT SDTTV 




UlbULrlU 


992 


1007 


BY SIMILARITY, 




DISULFID 


1009 


1018 


BY SIMILARITY. 


FT 


UlsULrlU 


1025 


1036 


BY SIMILARITY. 




HTCrTT ETH 


1030 


1045 


BY SIMILARITY, 


FT 








OV CTUTT ADTWtf 

bi alMIbAKIil, 


FT 


riTcnr FTn 


1063 


1074 


nV CTUTT ADTTV 




riTcriTPTn 






BY SIMILARITY. 


FT 


nTcnrtrTn 
uioULriu 


1085 


1094 


nv CTUTT JlDTtttf 

bi olMlbAKm. 


FT 


nTCnrPTn 

UlDUijf IU 


1101 


1122 


nv CTUTT SDTTV 




nTOnr dth 




1131 


BY SIMILARITY, 


FT 


nTcnrPTn 

UJLDULe IU 


1133 




BY SIMILARITY. 


It 




1149 


11 fin 


OV CTUTT nDTWV 

Bi olMlbAiun, 




DISULFID 


1154 


1169 


BY SIMILARITY. 


FT 


UloULr IL) 


1171 


1180 


BY SIMILARITY. 


FT 


DISULFID 


1187 


1198 


BY SIMILARITY, 


FT 


DISULFID 


1192 


• 1207 


BY SIMILARITY. 


FT 


DISULFID 


1209 


1218 


BY SIMILARITY, 


FT 


DISULFID 


1225 


1244 


BY SIMILARITY, 


FT 


DISULFID 


1238 


1253 


BY SIMILARITY. 


FT 


DISULFID 


1255 


1264 


BY SIMILARITY, 




LflDUilf ill 


1271 




HV CTMTT HDTTV 


FT 


DISULFID 


1276 


1293 


BY SIMILARITY. 


FT 


DISULFID 


1295 


1304 


BY SIMILARITY. 


FT 


DISULFID 


1311 


1322 


BY SIMILARITY. 


FT 


DISULFID 


1316 


1334 


BY SIMILARITY, 


FT 


DISULFID 


1336 


1345 


BY SIMILARITY. 


FT' 


DISULFID 


1352 


1363 


BY SIMILARITY. 


FT 


DISULFID 


1357 


1372 


BY SIMILARITY. 


FT 


DISULFID 


1374 


1383 


BY SIMILARITY. 


FT 


DISULFID 


1391 


1403 


BY SIMILARITY, 



Note: remainder of annotations omitted. 



Query Match 22.4%; Score 272; DB 1; Length 2531; 

Best Local Similarity 42,1%; Pred. No. l,26e-36; 

Matches 40; Conservative 18; Mismatches 31; Indels 6; Gaps 

Db 931 CDCLPGFOGAFCEEDINECATNPCQNGANCTDCVDSYTCTCPTGFNGIHCE— N-NTP- 985 

1:1:11: I I I: ::| : Mill I I Ml: |: |::| || : :| 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 986 D-CTESSCFNGGTCVDGINSFTCLCPPGFTGSYCQ 1019 

I : I II: III I I I' III I: I: 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 13 

ID NTC4JOUSE STANDARD; PRT; 1964 AA. 

AC P31695; Q62389; 

DT 01-JUL-1993 (REL, 26, CREATED) 

DT 01-NOV-1997 (REL, 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4 PRECURSOR (TRANSFORMING 

DE PROTEIN INT -3). 

GN N0TCH4 OR INT3 OR INT-3. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENT IA; SCIUROGNATHI; MURIDAE; MURINAE; MUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 92194507. 

RA ROBBINS J., BLONDEL B.J., GALLAHAN D,, CALLAHAN R. ; 

RT "Mouse mammary tumor gene int-3: a member of the notch gene family 

RT transforms mammary epithelial cells."; 

RL J. VIROL. 66:2594-2599(1992), 
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RN [2] 

RP REVISIONS, SEQUENCE FROM N.A, 

RA CALLAHAN R.; 

RL SUBMITTED (NOV-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [3] 

RP SEQUENCE FROM N,A. 

RC TISSUE-LUNG, AND TESTIS; 

RX MEDLINE; 96281668. 

RA UYTTENDAELE H., MARAZZI G., WUG., YAN Q., SASSOON D., KITAJEWSKI J,; 

RT "Notch4/int-3, a mammary proto-oncogene, is an endothelial 

RT cell -specific mammalian Notch gene."; 

RL DEVELOPMENT 122:2251-2259(1996). 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- DISEASE: ACTIVATED INT -3 TRANSFORMS MAMMARY EPITHELIAL CELLS. 

CC -!- SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS . 
-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 
-!- SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS , 
-!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; M80456; G1714084; -. ' 

DR EMBL; 043691; G1401160; -. 

DR PIR; A38072; TVMVT3. 

DR MGD; MGI: 107471; N0TCH4. 

DR PROSITE; PSOQ010; ASXJYDROXYL; 11. 

DR PROSITE; PS00022; EGFJ.; 28. 

DR PROSITE; PS01186; EGFJ; 21. 

DR PROSITE; PS01187; EGF CA; 9. 

DR PFAM; PF00008; EGF; 26, 

DR PFAM; PFG0023; ank; 6. 

DR PFAM; PF00066; notch; 2. 

DR HSSP; P00740; 1IXA, 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN; PROTO-ONCOGENE; ANK REPEAT ; SIGNAL. 



FT 


SIGNAL 


1 


20 


POTENTIAL. 


FT 


CHAIN 


21 


1964 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4. 


FT 


DOMAIN 


21 


1443 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


1444 


1464 


POTENTIAL. 




DOMAIN 


1465 


1964 


CYTOPLASMIC (POTENTIAL). 


1 


DOMAIN 


21 


60 


EGF-LIKE 1. 




DOMAIN 


61 


112 


EGF-LIKE 2. 


FT 


DOMAIN 


115 


152 


EGF-LIKE 3, 


FT 


DOMAIN 


153 


189 


EGF-LIKE 4. 


FT 


DOMAIN 


191 


229 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


231 


271 


EGF-LIKE 6. 


FT 


DOMAIN 


273 


309 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


311 


350 


EGF-LIKE 8, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


352 


388 


EGF-LIKE 9, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


389 


427 


EGF-LIKE 10. 


FT 


DOMAIN 


429 


470 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


472 


508 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


510 


546 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


548 


584 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


586 


622 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


622 


656 


EGF-LIKE 16, 


FT 


DOMAIN 


658 


686 


EGF-LIKE 17, 


FT 


DOMAIN 


688 


724 


EGF-LIKE 18. 


FT 


DOMAIN 


726 


762 


EGF-LIKE 19. 


FT 


DOMAIN 


764 


800 


EGF-LIKE 20. 


FT 


DOMAIN 


803 


839 


EGF-LIKE 21. 


FT 


DOMAIN 


841 


877 


EGF-LIKE 22. 


FT 


DOMAIN 


878 


924 


EGF-LIKE 23. 


FT 


DOMAIN 


926 


962 


EGF-LIKE 24. 


FT 


DOMAIN 


964 


1000 


EGF-LIKE 25. 


FT 


DOMAIN 


1002 


1040 


EGF-LIKE 26, 


FT ■ 


DOMAIN 


1042 


1081 


EGF-LIKE 27. 



FT 


DOMAIN 


1083 


1122 


EGF-LIKE 28. 


FT 


DOMAIN 


1126 


1167 


EGF-LIKE 29. 


FT 


DOMAIN 


1168 


1282 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1168 


1208 


LIN/NOTCH 1. 


FT 


REPEAT 


1209 


1242 


LIN/NOTCH 2. 


FT 


REPEAT 


1243 


1282 


LIN/NOTCH 3. 


FT 


DOMAIN 


1572 


1785 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1572 


1603 


ANK MOTIF 1. 


FT 


REPEAT 


1622 


1653 


ANK MOTIF 2. 


FT 


REPEAT 


1654 


1685 


ANK MOTIF 3. 


FT 


REPEAT 


1688 


1719 


ANK MOTIF 4. 


FT 


REPEAT 


1721 


1752 


ANK MOTIF 5. 


FT 


REPEAT 


1754 


1785 


ANK MOTIF 6. 


FT 


DISULFID 


25 


38 


BY SIMILARITY. 


FT 


DISULFID 


32 


48 


BY SIMILARITY. 


FT 


DISULFID 


50 


59 


BY SIMILARITY. 


FT 


DISULFID 


65 


77 


BY SIMILARITY. 


FT 


DISULFID 


71 


100 


BY SIMILARITY. 


FT ■ 


DISULFID 


102 


111 


BY SIMILARITY. 


FT 


DISULFID 


119 


130 


BY SIMILARITY. 


FT 


DISULFID 


124 


140 


BY SIMILARITY, 


FT 


DISULFID 


142 


151 


BY SIMILARITY. 


FT 


DISULFID 


157 


168 


BY SIMILARITY, 


FT 


DISULFID 


162 


177 


BY SIMILARITY, 


FT 


DISULFID 


179 


188 


BY SIMILARITY, 


FT 


DISULFID 


195 


208 


BY SIMILARITY. 


FT 


DISULFID 


202 


217 


BY SIMILARITY. 


FT 


DISULFID 


219 


228 


BY SIMILARITY. 


FT 


DISULFID 


235 


246 


BY SIMILARITY. 


FT 


DISULFID 


240 


259 


BY SIMILARITY. 


FT 


DISULFID 


261 


270 


BY SIMILARITY. 


FT 


DISULFID 


235 


246 


BY SIMILARITY, 


FT 


DISULFID 


240 


259 


BY SIMILARITY. 


FT 


DISULFID 


261 


270 


BY SIMILARITY. 


FT 


DISULFID 


277 


288 


BY SIMILARITY. 


FT 


DISULFID 


282 


297 


BY SIMILARITY, 


FT 


DISULFID 


299 


308 


BY SIMILARITY. 


FT 


DISULFID 


315 


329 


BY SIMILARITY, 


FT 


DISULFID 


. 323 


338 


BY SIMILARITY. 


FT 


DISULFID 


340 


349 


BY SIMILARITY. 


FT 


DISULFID 


356 


367 


BY SIMILARITY, 


FT 


DISULFID 


361 


376 


BY SIMILARITY, 


FT 


DISULFID 


378 


387 


BY SIMILARITY, 


FT 


DISULFID 


393 


404 


BY SIMILARITY, 


FT 


DISULFID 


398 


415 


BY SIMILARITY. 


FT 


DISULFID 


417 


426 


BY SIMILARITY, 


FT 


DISULFID 


433 


449 


BY SIMILARITY. 


FT 


DISULFID 


443 


458 


BY SIMILARITY. 


FT 


DISULFID 


460 


469 


BY SIMILARITY. 


FT 


DISULFID 


476 


487 


BY SIMILARITY. 


FT 


DISULFID 


481 


496 


BY SIMILARITY. 


FT 


DISULFID 


498 


507 


BY SIMILARITY. 


FT 


DISULFID 


514 


525 


BY SIMILARITY. 


FT 


DISULFID 


519 


534 


BY SIMILARITY. 


FT 


DISULFID 


536 


545 


BY SIMILARITY. 


FT 


DISULFID 


552 


563 


BY SIMILARITY. 


FT 


DISULFID 


557 


572 


BY SIMILARITY . 


FT 


DISULFID 


574 


583 


BY SIMILARITY. 


FT 


DISULFID 


590 


601 


BY SIMILARITY. 


FT 


DISULFID 


595 


610 


BY SIMILARITY. 


FT 


DISULFID 


612 


621 


BY SIMILARITY, 


FT 


DISULFID 


626 


637 


BY SIMILARITY. 


FT 


DISULFID 


631 


646 


BY SIMILARITY. 


FT 


DISULFID 


648 


655 


BY SIMILARITY. 


FT 


DISULFID 


662 


669 


BY SIMILARITY. 


FT 


DISULFID 


664 


674 


BY SIMILARITY, 


FT 


DISULFID 


676 


685 


BY SIMILARITY, 


FT 


DISULFID 


692 


703 


BY SIMILARITY, 


FT 


DISULFID 


697 


712 


BY SIMILARITY. 


FT 


DISULFID 


714 


723 


BY SIMILARITY, 


FT 


DISULFID 


730 


741 


' BY SIMILARITY. 


FT 


DISULFID 


735 


750 


BY SIMILARITY. 


FT 


DISULFID 


752 


761 


BY SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

•DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CONFLICT 

FT CONFLICT 

FT CONFLICT 



Note: remainder of annotations omitted. 



768 


779 


BY SIMILARITY. 


773 


788 


BY SIMILARITY. 


790 


799 


BY SIMILARITY. 


807 


818 


BY SIMILARITY. 


812 


827 


BY SIMILARITY. 


829 


838 


BY SIMILARITY. 


845 


856 


BY SIMILARITY. 


850 


865 


BY SIMILARITY. 


867 


876 


nv CTM17 ADTTV 
BI DlnlJjnKll I . 


882 


903 


BY SIMILARITY, 


897 


912 


BY SIMILARITY. 


914 


923 


RY «!TMTr,ARTTY 


930 


941 


BY SIMILARITY. 


935 


950 


BY SIMILARITY. 


952 


961 


BY SIMILARITY. 


968 


979 


BY SIMILARITY. 


973 


988 


BY SIMILARITY. 


990 


999 


BY SIMILARITY. 


1006 


1019 


BY SIMILARITY. 


1011 


1028 


BY SIMILARITY. 


1030 


1039 


BY SIMILARITY. 


1046 


1057 


BY SIMILARITY. 


1051 


1069 


BY SIMILARITY. 


1071 


1080 


BY SIMILARITY. 


1087 


1098 


BY SIMILARITY. 


1092 


1110 


BY SIMILARITY. 


1112 


1121 


BY SIMILARITY. 


1130 


1142 


BY SIMILARITY. 


1136 


1155 


BY SIMILARITY. 


1157 


1166 


BY SIMILARITY. 


711 


711 


POTENTIAL. 


960 


960 


POTENTIAL. 


1139 


1139 


POTENTIAL. 


43 


43 


Q -> R (IN REF. 3). 


298 


298 


L ■> P (IN REF. 3). 


884 


884 


M -> K (IN REF. 3), 



Query Match 22.2*; Score 269; DB 1; Length 1964; 

Best Local Similarity 38.0%; .Pred. No. 5.69e-36; 

Matches 38; Conservative 22; Mismatches 37; Indels 3; Gaps 3; 

Db 825 ARCLCSPGYTGSSCQTLIDLCARKPCPHTARCLQSGPSFQCLCLQGWTGALCDFPLSCQM 884 

: 1:1 :l II I : |:|:: |: III :| :| ||::| : :: 

Oy 2 PRCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIP-p-HL 59 

Db 885 AAMSQGIEISGLCQNGGLCIDTGSSYFCRCPPGFQGKLCQ 924 
^ :| I : MM: |:| |: |:| Ml I |: 
A 60 PAPKSPCEGTE-CQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 14 

ID NOTC.DROME STANDARD; PRT; 2703 AA, 

AC P07207; P04154; 

DT 01-NOV-1986 (REL. 03, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN PRECURSOR. 

GN N, 

OS DROSOPHILA MELANOGASTER (FRUIT FLY), 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 86079539. 

RA WHARTON K.A., JOHANSEN K.M., XU T., ARTAVANIS-TSAKONAS S.; 

RT "Nucleotide sequence from the neurogenic locus notch implies a g 

RT product that shares homology with proteins containing EGF-like 

RT repeats."; 

RL CELL 43:567-581(1985). 

RN [2] 



RP SEQUENCE FROM N.A. 

RC STRAIN-OREGON-R; 

RX MEDLINE; 87064624, 

RA KIDD S., KELLEY M,R, , YOUNG M.W.; 

RT "Sequence of the notch locus of Drosophila melanogaster: relationship 

RT of the encoded protein to mammalian clotting and growth factors . " ; 

RL MOL. CELL. BIOL, 6:3094-3108(1986), 

RN [3] 

RP SEQUENCE OF 2505-2611 FROM N.A. 

RX MEDLINE; 85099329. 

RA WHARTON K.A., YEDVOBNICK B. , FINNERTY V.G., ARTAVANIS-TSAKONAS S.; 

RT "opa: a novel family of transcribed repeats shared by the Notch locus 

RT and other developmentally regulated loci in D. melanogaster,"; 

RL CELL 40:55-62(1985). 

RN [4] 

RP SEQUENCE OF 1-8 FROM N.A. 

RX MEDLINE; 87257846. 

RA KELLEY M.R., KIDD $,, BERG R.L., YOUNG M.W.; 

RT ' "Restriction of P-element insertions at the Notch locus of Drosophila 
RT melanogaster , " ; 

RL MOL. CELL. BIOL, 7:1545-1548(1987), 

RN [5] 

RP REVIEW. 

RA HARRIS W.A.; 

RT "Many cell types specified by Notch function."; 
RL CURR. BIOL. 1:120-122(1991), 

CC ■!• FUNCTION: NOTCH PROTEIN IS ESSENTIAL FOR PROPER DIFFERENTIATION OF 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 
KW 
FT 
FT 
FT 
FT' 



•!• SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 
-!• SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 
OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 
THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 
-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 
-!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS, 
■I- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 
-I- SIMILARITY: CONTAINS 6 ANK REPEATS. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation • 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 

EMBL; M16152; G157988; -. 
EMBL; M16153; G157988; JOINED. 
EMBL; M16149; G157988; JOINED. 
EMBL; M16150; G157988; JOINED. 
EMBL; M16151; G157988; JOINED. 
EMBL; K03508; G157993; -. 
EMBL; M13689; G157993; JOINED. 
EMBL; K03507; G157993; JOINED. 
EMBL; M12175; G950317; -. 
EMBL; M16025; G157995; -. 
PIR; A24420; A24420. 
PIR; A24768; A24768. 
PIR; A05267; A05267. 
FLYBASE; FBgn0004647 ; N, 
PROSITE; PS00010; ASXJYDROXYL; 22. 
PROSITE; PS00022; EGFJ; 34. 
PROSITE; PS01186; EGF_2; 28. 
PROSITE; PS01187; EGF.CA; 22. 
PFAM; PF00008; EGF; 36. 
PFAM; PF00023; ank; 6. 
PFAM; PF00066; notch; 3, 
HSSP; P00740; 1IXA. 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 
SIGNAL 1 44 POTENTIAL. 

CHAIN 45 2703 NEUROGENIC LOCUS NOTCH PROTEIN. 

DOMAIN 45 1745 EXTRACELLULAR (POTENTIAL) . 
TRANSMEM 1746 1766 POTENTIAL. ' ■ 
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FT 


DOMAIN 


1767 


2703 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


58 


1451 


36 X EGF-TYPE REPEATS. 


FT 


DOMAIN 


58 


95 


EGF-LIKE 1. 


FT 


DOMAIN 


96 


136 


EGF-LIKE 2. 


FT 


DOMAIN 


139 


176 


EGF-LIKE 3. 


FT 


DOMAIN 


177 


215 


EGF-LIKE 4. 


FT 


DOMAIN 


217 


253 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


255 


291 


EGF-LIKE 6. 


FT 


DOMAIN 


293 


329 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


331 


370 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


372 


408 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


409 


447 


EGF-LIKE 10. 


FT 


DOMAIN 


449 


486 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


488 


524 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


526 


562 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


564 


600 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) 




DOMAIN 


602 


637 


EGF-LIKE 15, CALCIUM- BINDING (POTENTIAL) 


1 


DOMAIN 


639 


675 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL) 


P 


DOMAIN 


677 


713 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


715 


751 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


753 


789 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


791 


827 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


829 


865 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


867 


905 


EGF-LIKE 22. 


FT 


DOMAIN 


907 


944 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


946 


982 


EGF-LIKE 24, CALCIUM- BINDING (POTENTIAL) 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 25. 


FT 


DOMAIN 


1022 


1058 


EGF-LIKE 26, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1060 


1096 


EGF-LIKE 27. 


FT 


DOMAIN 


1098 


1134 


EGF-LIKE 28, 


FT 


DOMAIN 


1136 


1181 


EGF-LIKE 29, 


FT 


DOMAIN 


1183 


1219 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1221 


1257 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1259 


1295 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1297 


1335 


EGF-LIKE 33, 


FT 


DOMAIN 


1337 


1373 


EGF-LIKE 34. 


FT 


DOMAIN 


1375 


1412 


EGF-LIKE 35. 


FT 


DOMAIN 


1415 


1451 


EGF-LIKE 36, 


FT 


DOMAIN 


1475 


1593 


3 X LIN/NOTCH REPEATS . 


FT 


REPEAT 


1475 


1513 


LIN/NOTCH 1, 


FT 


REPEAT 


1514 


1553 


LIN/NOTCH 2, 


FT 


REPEAT 


1554 


1593 


LIN/NOTCH 3. 


FT 


DOMAIN 


1896 


2109 


6 X ANK MOTIF REPEATS. 


FT 


DOMAIN 


2538 


2568 


POLY-GLN (OPA-REPEAT). 


FT 


DISULFID 


62 


73 


BY SIMILARITY, 


1 


DISULFID 


67 


83 


BY SIMILARITY, 


1 


DISULFID 


85 


94 


BY SIMILARITY. 


7T 


DISULFID 


100 


111 


BY SIMILARITY. 


FT 


DISULFID 


105 


124 


BY SIMILARITY. 


FT 


DISULFID 


126 


135 


BY SIMILARITY. 


FT 


DISULFID 


143 


154 


BY SIMILARITY, 


FT 


DISULFID 


148 


164 


BY SIMILARITY, 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 


FT 


DISULFID 


181 


192 


BY SIMILARITY. 


FT 


DISULFID 


186 


203 


BY SIMILARITY. 


FT 


DISULFID 


205 


214 


BY SIMILARITY, 


FT 


DISULFID 


221 


232 


BY SIMILARITY, 


FT 


DISULFID 


226 


241 


BY SIMILARITY. 


FT 


DISULFID 


243 


252 


BY SIMILARITY. 


FT 


DISULFID 


259 


270 


BY SIMILARITY. 


FT 


DISULFID 


264 


279 


BY SIMILARITY. 


FT 


DISULFID 


281 


290 


BY SIMILARITY. 


FT 


DISULFID 


297 


308 


BY SIMILARITY. 


FT 


DISULFID 


302 


317 


BY SIMILARITY. 


FT 


DISULFID 


319 


328 


BY SIMILARITY. 


FT 


DISULFID 


335 


349 


BY SIMILARITY, 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 


FT 


DISULFID 


360 


369 


BY SIMILARITY. 


FT 


DISULFID 


376 


387 


BY SIMILARITY. 


FT 


DISULFID 


381 


396 


BY SIMILARITY, 


FT 


DISULFID 


398 


407 


BY SIMILARITY. 


FT 


DISULFID 


413 


424 


BY SIMILARITY, 


FT- 


DISULFID 


418 


435 


BY SIMILARITY. 



FT 


DISULFID 


437 


446 


BY SIMILARITY. 


FT 


DISULFID 


453 


465 


BY SIMILARITY. 


FT 


DISULFID 


459 


474 


BY SIMILARITY. 


FT 


DISULFID 


476 


485 


BY SIMILARITY. 


FT 


DISULFID 


492 


503 


BY SIMILARITY. 


FT 


DISULFID 


497 


512 


BY SIMILARITY. 


FT 


DISULFID 


514 


523 


BY SIMILARITY. 


FT 


DISULFID 


530 


541 


BY SIMILARITY. 


FT 


DISULFID 


535 


550 


BY SIMILARITY, 


FT 


DISULFID 


552 


561 


BY SIMILARITY, 


FT 


DISULFID 


568 


579 


BY SIMILARITY. 


FT 


DISULFID 


573 


588 


BY SIMILARITY. 


FT 


DISULFID 


590 


599 


BY SIMILARITY, 


FT 


DISULFID 


606 


616 


BY SIMILARITY . 


FT 


DISULFID 


611 


625 


BY SIMILARITY. 


FT 


DISULFID 


627 


636 


BY SIMILARITY. 


FT 


DISULFID 


643 


654 


BY SIMILARITY . 


FT 


DISULFID 


648 


663 


BY SIMILARITY . 


FT 


DISULFID 


665 


674 


BY SIMILARITY, 


FT 


DISULFID 


681 


692 


BY SIMILARITY, 


FT 


DISULFID 


686 


701 


BY SIMILARITY. 


FT 


DISULFID 


703 


712 


BY SIMILARITY. 


FT 


DISULFID 


719 


730 


BY SIMILARITY. 


FT 


DISULFID 


724 


739 


BY SIMILARITY. 


FT 


DISULFID 


741 


750 


BY SIMILARITY, 


FT, 


DISULFID 


757 


768 


BY SIMILARITY, 


FT 


DISULFID 


762 


777 


BY SIMILARITY. 


FT 


DISULFID 


779 


788 


BY SIMILARITY, 


FT 


DISULFID 


795 


806 


BY SIMILARITY. 


FT 


DISULFID 


800 


815 


BY SIMILARITY. 


FT 


DISULFID 


817 


826 


BY SIMILARITY. 


FT 


DISULFID 


833 


844 


BY SIMILARITY. 


FT 


DISULFID 


838 


853 


BY SIMILARITY. 


FT 


DISULFID 


855 


864 


BY SIMILARITY, 



Note: remainder of annotations omitted. 

Query Match 22.2%; Score 269; DB 1; Length 2703; 

Best Local Similarity 41.5%; Pred, No. 5.69e-36; 

Matches 39; Conservative 17; Mismatches 32; Indels 6; Gaps 4; 

Db 1008 CTCPLGFSGINCQTNDEDCTESSCLNGGSCIDGINGYNCSCLAGYSGANCQY--KL-N-K 1063 

I I II l::ll : I II: hi :|:|:| I |||| |: I | 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 1064 -CDSNPCLNGATCHEQNNEYTCHCPSGFTGKQC 1095 

I:: I III I :| I ■ j:| :|| I :| 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPEC 97 



RESULT 15 

ID NOTCJENLA STANDARD; PRT; 2524 AA. 

AC P21783; 

DT 01-MAY-1991 (REL. 18, CREATED) 

DT 01-OCT-1996 (REL. 34, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG PRECURSOR (XOTCH PROTEIN) . 

GN XOTCH, 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 90385285. 

RA COFFMAN C, HARRIS W., KINTNER C; 

RT "Xotch, the Xenopus homolog of Drosophila notch,"; 

RL SCIENCE 249:1438-1441(1990), 

RN [2] 

RP REVISIONS TO 1759-1782. 

RA KINTNER C; 

RL SUBMITTED (JUN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -I- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 
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cc 


■!• SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 


cc 


-!- SIMILARITY: CONTAINS 36 


SGF-LIKE DOMAINS. 


FT 


DISULFID 


62 


74 


BY SIMILARITY, 


cc 


■!■ SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 


FT 


DISULFID 


68 


87 


BY SIMILARITY, 


cc 


-!• SIMILARITY: CONTAINS 6 ANK REPEATS. 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


cc 










FT 


DISULFID 


106 


117 


BY SIMILARITY, 










cc 


This SWISS-PROT entry is copyright, It is produced through a collaboration 


FT 


DISULFID 


111 


128 


BY SIMILARITY, 


cc 


between 


the Swiss institute of Bioinformatics and the EMBL outstation - 


FT 


DISULFID 


130 


139 


BY SIMILARITY. 


cc 


the European Bioinformatics Institute. There are no restrictions on its 


FT 


DISULFID 


145 


156 


BY SIMILARITY. 


cc 


use by 


non-profit institutions as long as its content is in no way 


FT 


DISULFID 


150 


165 


BY SIMILARITY. 


cc 


modified and this statement is not removed, usage by and for commercial 


FT 


DISULFID 


167 


176 


BY SIMILARITY, 


cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


FT 


DISULFID 


183 


194 


BY SIMILARITY, 


cc 


or send an email to licenseOisb-sib.ch) . 


FT 


DISULFID 


188 


203 


BY SIMILARITY, 


cc 










FT 


DISULFID 


205 


214 


BY SIMILARITY. 










DR 


EMBL; M33874; G1364263; -. 




FT 


DISULFID 


221 


232 


BY SIMILARITY. 


DR 


PIR; A35844; A35844 . 




FT 


DISULFID 


226 


242 


BY SIMILARITY. 


DR 


PROSITE; PS00010; ASXJYDROXYL; 23. 


FT 


DISULFID 


244 


253 


BY SIMILARITY, 


DR 


PROSITE; PS00022; 


EGF.1; 34. 




FT 


DISULFID 


260 


271 


BY SIMILARITY. 


DR 


PROSITE; 


PS01186; 


EGF.2; 29. 




FT 


DISULFID 


265 


280 


BY SIMILARITY. 


DR 


PROSITE; PS01187 ; EGF.CA; 21. 


FT 


DISULFID 


282 


291 


BY SIMILARITY. 


DR 


PFAM; PF 


0008; EGF; 36. 




FT 


DISULFID 


298 


311 


BY SIMILARITY. 


Ik 


PFAM; PF00023; ank; 6, 




FT 


DISULFID 


305 


320 


BY SIMILARITY. 


1 


PFAM; PF00066; notch; 3. 




FT 


DISULFID 


322 


331 


BY SIMILARITY, 


w 


HSSP; P00740; 1IXA. 




FT 


DISULFID 


338 


349 


BY SIMILARITY, 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


FT 


DISULFID 


360 


369 


BY SIMILARITY, 


FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


DISULFID 


375 


386 


BY SIMILARITY, 


FT 


CHAIN 


20 


2524 


NEUROGENIC LOCOS NOTCH PROTEIN HOMOLOG. 


FT 


DISULFID 


380 


397 


BY SIMILARITY, 


FT 


DOMAIN 


20 


1728 


EXTRACELLULAR (POTENTIAL). 


FT 


DISULFID 


399 


408 


BY SIMILARITY. 


FT 


TRANSMEM 


1729 


1750 


POTENTIAL. 


FT 


DISULFID 


415 


428 


BY SIMILARITY. 


FT 


DOMAIN 


1751 


2524 


CYTOPLASMIC (POTENTIAL) , 


FT 


DISULFID 


422 


437 


BY SIMILARITY. 


FT 


DOMAIN 


20 


57 


EGF-LIKE 1. 


FT 


DISULFID 


439 


448 


BY SIMILARITY. 


FT 


DOMAIN 


58 


99 


EGF-LIKE 2. 


FT 


DISULFID 


455 


466 


BY SIMILARITY. 


FT 


DOMAIN 


102 


140 


EGF-LIKE 3. 


FT 


DISULFID 


460 


475 


BY SIMILARITY. 


FT 


DOMAIN 


141 


177 


EGF-LIKE 4. 


FT 


DISULFID 


477 


486 


BY SIMILARITY. 


FT 


DOMAIN 


179 


215 


EGF-LIKE 5, CALCIUM- BINDING (POTENTIAL), 


FT 


DISULFID 


493 


504 


BY SIMILARITY. 


FT 


DOMAIN 


217 


254 


EGF-LIKE 6. 


FT 


DISULFID 


498 


513 


BY SIMILARITY. 


FT 


DOMAIN 


256 


292 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


515 


524 


BY SIMILARITY. 


FT 


DOMAIN 


294 


332 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


531 


542 


BY SIMILARITY. 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


536 


551 


BY SIMILARITY . 


FT 


DOMAIN 


371 


409 


EGF-LIKE 10. 


FT- 


DISULFID 


553 


562 


BY SIMILARITY. 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


569 


579 


BY SIMILARITY. 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


574 


588 


BY SIMILARITY. 


FT 


DOMAIN 


489 


525 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


590 


599 


BY SIMILARITY. 


FT 


DOMAIN 


527 


563 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


606 


617 


BY SIMILARITY. 


FT 


DOMAIN 


565 


600 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


611 


626 


BY SIMILARITY. 


FT 


DOMAIN 


602 


638 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


628 


637 


BY SIMILARITY. 


FT 


DOMAIN 


640 


675 


EGF-LIKE 17. 


FT 


DISULFID 


644 


654 


BY SIMILARITY. 


FT 


DOMAIN 


677 


713 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


649 


663 


BY SIMILARITY. 


IT 


DOMAIN 


715 


750 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


665 


674 


BY SIMILARITY. 


i 


DOMAIN 


752 


788 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


681 


692 


BY SIMILARITY. 


■ 


DOMAIN 


790 


826 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


686 


701 


BY SIMILARITY. 


FT 


DOMAIN 


828 


866 


EGF-LIKE 22. 


FT 


DISULFID 


703 


712 


BY SIMILARITY. 


FT 


DOMAIN 


868 


904 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


719 


729 


BY SIMILARITY. 


FT 


DOMAIN 


906 


942 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


724 


738 


BY SIMILARITY. 


FT 


DOMAIN 


944 


980 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


740 


749 


BY SIMILARITY. 


FT 


DOMAIN 


982 


1018 


EGF-LIKE 26. 


FT 


DISULFID 


756 


767 


BY SIMILARITY. 


FT 


DOMAIN 


1020 


1056 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


761 


776 


BY SIMILARITY. 


FT 


DOMAIN 


1058 


1094 


EGF-LIKE 28. 


FT 


DISULFID 


778 


787 


BY SIMILARITY. 


FT 


DOMAIN 


1096 


1142 


EGF-LIKE 29. 


FT 


DISULFID 


794 


805 


BY SIMILARITY . 


FT 


DOMAIN 


1144 


1180 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


799 


814 


BY SIMILARITY. 


FT 


DOMAIN 


1182 


1218 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


816 


825 


BY SIMILARITY. 


FT 


DOMAIN 


1220 


1264 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


832 


843 


BY SIMILARITY. 


FT 


DOMAIN 


1266 


1304 


EGF-LIKE 33. 


FT 


DISULFID 


837 


854 


BY SIMILARITY. 


FT 


DOMAIN 


1306 


1346 


EGF-LIKE 34. 


FT 


DISULFID 


856 


865 


BY SIMILARITY. 


FT 


DOMAIN 


1347 


1383 


EGF-LIKE 35. 


FT 


DISULFID 


872 


883 


BY SIMILARITY, 


FT 


DOMAIN 


1386 


1424 


EGF-LIKE 36. 


FT 


DISULFID 


877 


892 


BY SIMILARITY. 


FT 


DOMAIN 


1441 


1560 


3 X LIN/NOTCH REPEATS. 


FT 


DISULFID 


894 


903 


BY SIMILARITY. 


FT 


REPEAT 


1441 


1478 


LIN/NOTCH 1. 


FT 


DISULFID 


910 


921 


BY SIMILARITY. 


FT 


REPEAT 


1479 


1520 


LIN/NOTCH 2. 


FT 


DISULFID 


915 


930 


BY SIMILARITY. 


FT 


REPEAT 


1521 


1560 


LIN/NOTCH 3. 


FT 


DISULFID 


932 


941 


BY SIMILARITY. 


FT 


DOMAIN 


1871 


2083 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


986 


997 


BY SIMILARITY, 


FT 


DISULFID 


22 


35 


BY SIMILARITY. 


FT 


DISULFID 


991 


1006 


BY SIMILARITY, 


FT 


DISULFID 


29 


45 


BY SIMILARITY. 


FT 


DISULFID 


1008 


1017 


BY SIMILARITY. 
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FT 


DISULFID 


1024 


1035 


BY SIMILARITY. 


FT 


DISULFID 


1029 


1044 


BY SIMILARITY. 


FT 


DISULFID 


1046 


1055 


BY SIMILARITY. 


FT 


DISULFID 


1062 


1073 


BY SIMILARITY, 


FT 


DISULFID 


1067 


1082 


BY SIMILARITY. 


FT 


DISULFID 


1084 


1093 


BY SIMILARITY. 


FT 


DISULFID 


1100 


1121 


BY SIMILARITY. 


FT 


DISULFID 


1115 


1130 


BY SIMILARITY. 


FT 


DISULFID 


1132 


1141 


BY SIMILARITY, 


FT 


DISULFID 


1148 


1159 


BY SIMILARITY, 


FT 


DISULFID 


1153 


1168 


BY SIMILARITY, 


FT 


DISULFID 


1170 


1179 


BY SIMILARITY. 


FT 


DISULFID 


1186 


1197 


BY SIMILARITY. 


FT 


DISULFID 


1191 


1206 


BY SIMILARITY. 


FT 


DISULFID 


1208 


1217 


BY SIMILARITY, 


FT 


DISULFID 


1224 


1243 


BY SIMILARITY. 


EP 


DISULFID 


1237 


1252 


BY SIMILARITY. 


m 


DISULFID 


1254 


1263 


BY SIMILARITY. 


W 


DISULFID 


1270 


1283 


BY SIMILARITY. 


Tt 


DISULFID 


1275 


1292 


BY SIMILARITY. 


FT 


DISULFID 


1294 


1303 


BY SIMILARITY. 


FT 


DISULFID 


1310 


1321 


BY SIMILARITY. 


FT 


DISULFID 


1315 


1333 


BY SIMILARITY. 


FT 


DISULFID 


1335 


1344 


BY SIMILARITY. 


FT 


DISULFID 


1351 


1362 


BY SIMILARITY. 


FT 


DISULFID 


1356 


1371 


BY SIMILARITY. 


FT 


DISULFID 


1373 


1382 


BY SIMILARITY. 


FT 


DISULFID 


1390 


1401 


BY SIMILARITY. 


FT 


DISULFID 


1395 


1412 


BY SIMILARITY, 


FT 


DISULFID 


1414 


1423 


BY SIMILARITY. 


FT 


CARBOHYD 


462 


462 


POTENTIAL. 


FT 


CARBOHYD 


887 


887 


POTENTIAL. 



Note: remainder of annotations omitted. 



Query Match 22.1*; Score 268; DB 1; Length 2524; 

Best Local Similarity 41.7%; Pred. No. 9.41e-36; 

Matches 40; Conservative 14; Mismatches 36; Indels 6; Gaps 3; 

Db 474 QCICMPGYEGLYCETNIDECASNPCLHNGKCIDRINEFRCDCPTGFSGNLCQ--H--DF 528 

:| HIM I I I |:| : | : : |:| :| : | |: |:|| ||: | 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 529 DE-CTSTPCKNGAKCLDGPNSYTCQCTEGFTGRHCE 563 




63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 



Search completed: Fri May 28 08:54:52 1999 
Job time : 22 sees, 
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21 283 


23.3 


1218 4 


Q15816 


TRANSMEMBRANE PROTEIN 


4 .lle-32 






22 283 


23.3 


1219 11 


Q63722 


JAGGED PROTEIN. 


4 .lle-32 


— — 




23 283 


23.3 


1227 4 


P78504 


JAGGED 1 (TRANSMEMBRAN 


4. lle-32 


1 1 \ / / 1 1 II 1 1 _ 1 1 III II 




24 283 


23.3 


2352 5 


061240 


HRNOTCH PROTEIN. 


4. lle-32 


1 l\ V /I 1 1 1 1 1 1 1 Mill II II 




25 281 


23.2 


1218 4 


015122 


JAGGED1. 


9.58e-32 


1 1 V.V 1 1 1 1 1 III 1 l_! 1 1 1 1 1 1 1 




26 278 


22.9 


1372 5 


P91526 


SIMILARITY TO MULTIPLE 


3.40e-31 


M III 1 I II _l 1 I I I 




27 278 


22.9 


2653 5 


Q25253 


NOTCH HOMOLOG SCALLOPE 


3.40e-31 


ii i i i i i i i i\ \ ii i i i i 
1 i MM M 1 l\ \ II 1 1 II 




28 275 


22.7 


2447 13 


013149 


NOTCH 2 (FRAGMENT), 


1.21e-30 


M 1 1 1 1 _l 1 1 1 \ \ 1 l_ 1 1 II 




29 273 


22,5 


387 11 


Q06007 


NOTCH PROTEIN HOMOLOG 


2.80e-30 


LI LI LI 1 1 LI V.\ 1 1 LI LI (TM) 




30 273 


22.5 


2531 5 


016004 


NOTCH HOMOLOG. 


2.80e-30 






31 . 272 


22,4 


1212 13 


042347 


C-SERRATE-2 (FRAGMENT) 


4.26e-30 






32 266 


21,9 


615 13 


057409 


DELTAB. 


5.30e-29 






33 265 


21,9 


1964 11 


035442 


N0TCH4 , 


8.05e-29 


Release 3 . 1A John F. Collins, Biocomputing Research Unit. 




34 262 


21.6 


955 4 


Q99466 


N0TCH4 (FRAGMENT) . 


2.83e-28 


Copyright (c) 1993-1998 University of Edinburgh, U.K. 




35 262 


21.6 


1999 4 


Q99940 


N0TCH4. 


2.83e-28 


Distribution rights by Oxford Molecular Ltd 




36 262 


21.6 


2003 4 


000306 


N0TCH4. 


2.83e-28 






37 256 


21.1 


642 13 


P79941 


NOTCH LIGAND X-DELTA-2 


3.45e-27 


■|rch_pp protein - protein database search, using Smith-Waterman algorithm 




38 255 


21,0 


1687 11 


Q61204 


N0TCH2-LIKE (EGF REPEA 


5.24e-27 


W 




39 248 


20.5 


1202 11 


P97607 


JAGGED2 (FRAGMENT), 


9.58e-26 


Run on: Fri May 28 08:55:11 1999; MasPar time 12.95 Seconds 




40 245 


20.2 


434 11 


055139 


JAGGED2 PROTEIN (FRAGM 


3.31e-25 


674.489 Million cell updates/sec 




41 245 


20.2 


518 11 


070219 


JAGGED 2 (JAGGED 2 PRO 


3.31e-25 


Tabular output not generated. 




42 244 


20.1 


592 11 


088516 


DELTA-LIKE 3 ALTERNATE 


5.01e-25 


llllC. <>UO U7 131 04/ J 




43 242 


20.0 


585 11 


035675 


M-DELTA-LIRE 3 GENE PR 


1.14e-24 




44 237 


19,6 


263 4 


Q99734 


NOTCH2 TRANSMEMBRANE P 


8.94e-24 


Description: (1-160) from US09191647. pep 




45 235 


19.4 


156 5 


Q26661 


EPIDERMAL GROWTH FACTO 


2.03e-23 


Perfect Score: 1212 














Sequence: 1 WPRCECMPGYAGDNCSENQD NGGNDHIAVXLYXGHVRFSY 160 


























ALIGNMENTS 




Scoring table: PAM 150 
















Gap 11 


RESULT 1 














ID 


088279 


PRELIMINARY; 


PRT; 


1531 AA. 




Searched: 179066 seqs, 54579741 residues 


AC 


088279; 











Post-processing: Minimum Match 0% 

Listing first 45 summaries 

Database: sptrembl9 

l:sp_archea 2:sp_bacteria 3:sp_fungi 4:spjiuman 
5:sp_invertebrate 6 : spjnammal 7:sp_mhc 8:sp_organelle 
9:sp_phage 10:sp_plant ll:sp_rodent 12:sp_unclassified 
13:sp_vertebrate 14:sp_virus 

Statistics; Mean 40.449; Variance 82.697; scale 0.489 

•Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Result 



Query 



NO. 


Score 


Match Length 


DB 


ID 


Description 


Pred. 


to. 


1 


1039 


85.7 


1531 


11 


088279 


MEGF4. 


3.84e 


182 


2 


684 


56.4 


1523 


11 


088280 


MEGF5. 


3,92e 


110 


3 


669 


55.2 


739 


4 


075094 


MEGF5 (FRAGMENT). 


3.98e 


107 


4 


491 


40.5 


530 


5 


Q24526 


SLIT LOCUS ENCODING A 


7.89e 


72 


5 


480 


39.6 


601 


5 


Q20204 


F40E10.4 PROTEIN (FRAG 


l,12e 


69 


6 


322 


26.6 


721 


13 


Q91902 


X-DELTA-1. 


2.37e 


39 


7 


309 


25.5 


406 


5 


Q25059 


FIBROPELLIN III (FRAGM 


6.34e 


37 


8 


304 


25.1 


728 


13 


Q90656 


TRANSMEMBRANE PROTEIN 


5.40e 


36 


9 


303 


25.0 


1476 


13 


Q90285 


PUTATIVE EXTRACELLULAR 


8.28e 


36 


10 


301 


24.8 


529 


5 


Q25058 


FIBROPELLIN IA (FRAGME 


1.95e 


35 


11 


298 


24.6 


717 


13 


P87357 


DELTAD TRANSMEMBRANE P 


7.00e 


35 


12 


297 


24.5 


3313 


11 


088278 


MEGF2, 


1.07e 


34 


13 


294 


24.3 


1203 


11 


Q06008 


NOTCH PROTEIN HOMOLOG 


3.85e 


34 


14 


295 


24.3 


2408 


4 


Q92566 


MYELOBLAST KIAA0279 (F 


2,52e 


34 


15 


294 


24.3 


2470 


11 


035516 


CELL SURFACE PROTEIN. 


3.85e 


34 


16 


291 


24.0 


1722 


5 


Q19350 


SIMILAR TO EGF -LIKE RE 


l,38e 


33 


17 


285 


23.5 


802 


13 


057462 


DELTAA. 


1.76e 


32 


18 


284 


23.4 


1193 


13 


Q90819 


C-SERATE-1 PROTEIN (FR 


2.69e 


32 


19 


283 


23.3 


752 


13 


042374 


NOTCH RECEPTOR PROTEIN 


4. lie 


32 


20 


283 


23.3 


1218 


4 


014902 


TRANSMEMBRANE PROTEIN 


4, lie 


32 



01-NOV-1998 (TREMBLREL. 08, CREATED) 
01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
MEGF4. 
MEGF4. 

RATTUS NORVEGICUS (RAT), 

EUKARYOTA; METAZOA; CH0RDATA; VERTEBRA! A; MAMMALIA; EUTHERIA; R0DENTIA; 

SCICROGNATHI; MURIDAE; MURINAE; RATTUS. 

[1] 

SEQUENCE FROM N.A, 

STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N, , SEKI N. , OHARA 0.; 

"Identification of high -molecular -weight proteins with multiple 

EGF-like motifs by motif -trap screening,"; 

GENOMICS 51:27-34(1998). 

EMBL; AB011530; D1033423; -. 

PROSITE; PS01185; CTCK.1; 1. 

PROSITE; PS01186; EGFJ; 8. 

PROSITE; PS01187; EGF_CA; 2. 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 



Query Match 85.7%; 
Best Local Similarity 86.2%; 
137; Conservative 



Matches 


Db 


1067 


Qy 


2 


Db 


1124 


Qy 


62 


Db 


1184 


Qy 


122 



Score 1039; DB 11; Length 1531; 

Pred. No. 3.84e-182; 

11; Mismatches 8; Indels 3; 
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ID 088280 PRELIMINARY; PRT; 1523 AA, 
AC 088280; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT Ql-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF5. 

GN MEGF5. 

OS RATTUS NORVEGICUS (RAT) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 
OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 
RN [1J 

RP SEQUENCE FROM N.A, 

RC ST RAI N=S PRAGUE - DAWLE Y ; TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D. ( NAGASE T., NOMURA N., SEKI N., OHARA 0.; 
RT "Identification of high-molecular-weight proteins with multiple 
RT EGF-like motifs by motif -trap screening."; 
RL GENOMICS 51:27-34(1998) , 
DR EMBL; AB011531; D1033424; •. 
DR PROSITE; PS01185; CTCK_1; 1. 

» PROSITE; PS01186; EGFJ; 7. 
PROSITE; PS01187; EGF.CA; 2. 
GLYCOPROTEIN; EGF-LIKE DOMAIN. 
SEQUENCE 1523 AA; 167767 MW; 2BD845D0 CRC32; 

Query Match 56.41; Score 684; DB 11; Length 1523; 

Best Local Similarity 52.2*; Pred. No. 3.92e-110; 

Matches 83; Conservative 34; Mismatches 41; Indels 1; Gaps 1; 

Db 1059 RCECVPGYSGKLCETDNDDCVAHKCRHGAQCVDAVNGYTCICPQGFSGLFCEHPPPMVLL 1118 

1111:111:1 I : III l:|::||||:| 1 1 : 1 : 1 : 1 : : 1 : 1 1 :|| II : 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA- 61 

Db 1119 QTSPCDQYECQNGAQCIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVRP 1178 

III: llllll I: I hi 111:11 M 1 1 : : i 1 1 1 :|:|:;:; : 
Qy 62 PKSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWX 121 

Db 1179 QANISLQVATDKDNGILLYKGDNDPLALBLYQGHVRLVY 1217 

: Ihlll I llllllhl II :|: II Mil: I 
Qy 122 RXNITLQVFTAEDNGILLYNGGNDHIAVXLYXGHVRFSY 160 



RESULT 3 

ID 075094 PRELIMINARY; PRT; 739 AA. 

AC 075094; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

§MEGF5 (FRAGMENT), 
MEGF5. 
HOMO SAPIENS (HUMAN). 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

0C CATARRHINI; H0MINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M,, NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA 0,; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif-trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011538; D1033429; -. 

DR PROSITE; PS01185; CTCK 1; 1. 

DR PROSITE; PS01186; EGPJ; 7, 

DR PROSITE; PS01187; EGF.CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 1 1 

SQ SEQUENCE 739 AA; 80364 MW; DC6BCB63 CRC32; 

Query Match 55,2*; Score 669; DB 4; Length 739; 

Best Local Similarity 51.3%; Pred. No. 3.98e-107; 

Matches 81; Conservative 34; Mismatches 42; Indels 1; Gaps 1; 



Db 276 CECVPGYSGKLCETDNDDCVAHKCRHGAQCVDTINGYTCTCPQGFSGPFCEHPPPMVLLQ 335 

111:111:1 I : III hl::lll|:| :|:|:| |::|:|| :|| || : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA-P 62 

Db 336 TSPCDQYECQNGAQCIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVRPQ 395 

III: llllll I: I 1:1 llhll ||||::|||| :|:|:::: : : 
Qy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWXR 122 

Db 396 ANISLQVATDKDNGILLYKGDNDPLALELYQGHVRLVY 433 

Ihlll I 1111111:1 II :|: II Mil: I 
Qy 123 XNITLQVFTAEDNGILLYNGGNDHIAVXLYXGHVRFSY 160 



RESULT 4 

ID Q24526 PRELIMINARY; PRT; 530 AA. 

AC Q24526; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE SLIT LOCUS ENCODING A PROTEIN ASSOCIATED WITH NEURAL DEVELOPMENT WITH 

DE 52D EGF HOMOLOGOUS DOMAINS (FRAGMENT) . 

OS DROSOPHILA MELANOGASTER (FRUIT FLY) . 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-CANTON S; 

RX MEDLINE; 89077533. 

RA R0THBERG J.M., HARTLEY D.A., WALTHER Z., ART AVANI S - T SAKONAS S.; 

RT "slit: an EGF-homologous locus of D. melanogaster involved in the 

RT development of the embryonic central nervous system,"; 

RL CELL 55:1047-1059(1988). 

DR EMBL; M23543; G514357; -. 

DR FLYBASE; FBgn0003425; sli. 

DR PROSITE; PS01186; EGF.2; 5. 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PF00054; laminin_G; 1. 

KW NEUROGENESIS; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 530 530 

SQ SEQUENCE 530 AA; 59457 MW; 10E5764D CRC32; 

Query Match 40.54; Score 491; DB 5; Length 530; 

Best Local Similarity 39.54; Pred. No. 7,89e-72; 

Matches 64; Conservative 43; Mismatches 49; Indels 6; Gaps 4; 

Db 170 CDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMY 229 

hi :h I I:: |||::| III: 1:1 :| I I I:: 1:1 || :: 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPA-- 61 

Db 230 PQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLR 289 

I llh II :| lh :hl lh I II I |::|| :::::: h 
Qy 62 PK-SPCEGTECQNGA--NCVDQGNRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQ 118 

Db 290 TRPEANVTI-VFSSGQNGILMYDGQDAHLAVELFNGRIRVSY 330 

: hh lh: :||||:|:| : 1 : 1 1 h h:| II 
Qy 119 NWXRXNITLQVFTAEDNGILLYNGGNDHIAVXLYXGHVRFSY 160 



RESULT 5 

ID Q20204 PRELIMINARY; PRT; 601 AA, 

AC 020204; 

DT Ol-NOV-1996 (TREMBLREL. 01, CREATED) 

DT Ol-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE F40E10. 4 PROTEIN (FRAGMENT). 

GN F40E10.4. 

OS CAENORHABDITIS ELEGANS, 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 
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f 



SMYE RE- 
SUBMITTED (FEB-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 
[2] 

SEQUENCE FROM N.A, 
MEDLINE; 94150718. 

WILSON R,, AINSCOUGH R. , ANDERSON K., BAYNES C, BERKS M. , 
BONFIELD J., BURTON J., CONNELL M., COPSEY T., COOPER J., COULSON A., 
CRAXTON M. ( DEAR S., DU Z,, DURBIN R., FAVELLO A., FULTON L. , 
GARDNER A., GREEN P., HAWKINS T., HILLIER L., JIER M., JOHNSTON L., 
JONES M., KERSHAW J., KIRSTEN J., LAISTER N., LATREILLE P., 
LIGHTNING J,, LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M., 
PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R., 
SMALDON N,, SMITH A., SONNHAMMER E., ST ADEN R., SULSTON J. ( 
THIERRY -MIEG J., THOMAS K. , VAUDIN M., VAUGHAN K., WATERSTON R, , 
WATSON A,, WEINSTOCK L., WILKINSON- SPROAT J., WOHLDMAN P.; 
n 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 



NATURE 368:32-38(1994). 
EMBL; Z69792; E1346469; -. 
PROSITE; PS01187; EGF.CA; 1. 
GLYCOPROTEIN; EGF-LIKE DOMAIN. 
NONJER 1 1 

SEQUENCE 601 AA; 66669 MW; F8A72773 CRC32; 



Query Match 39.6%; Score 480; DB 5; Length 601; 

Best Local Similarity 40.1%; Pred. No. l,12e-69; 

Matches 65; Conservative 33; Mismatches 56; Indels 8; Gaps £ 

Db 202 CMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEY 261 

I I I |||:: III: |:| : || ||| ||:|| ||||| : 

Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNS YSCLCAEGYSGQLCEI PPHLP ■ A • 61 

Db 262 QKTDACQQSACGQG-ECVASQNSSDFTCKCHEGFSGPSCDRQMSVGFKNPGAYLALDPLA 320 

h :|: : I I :|| I : II ||:|| |:: :|| | : :|| : | 
Qy 62 PKS-PCEGTECQNGANCVD-QGNRP-VCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQ 118 

Db 321 S-D-GTITMTLRTTSKIGILLYYGDDHFVSAELYDGRVKLVY 360 

: II: : |: Mill I : :: II |:|:: I 
Qy 119 NWXRXNI T LQVFT AEDNG I LLYNGG NDH I AVXL YXGHVRFS Y 160 



RESULT 6 

ID Q91902 PRELIMINARY; PRT; 721 AA. 

AC Q91902; 

m 01-NOV-1996 (TREMBLREL. 01, CREATED) 

A 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

H 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

uE X-DELTA-1. 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 95319507. 

RA HENRIQUE D., ADAM J,, MYAT A., CHITNIS A., LEWIS J., ISH-HOROWICZ D.; 

RT "Expression of a Delta homologue in prospective neurons in the 

RT chick."; 

RL NATURE 375:787-790(1995). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 95319503, 

RA CHITNIS A.B., HENRIQUE D., LEWIS J,, ISH-HOROWICZ D., KINTNER C.'R.; 

RT "Primary neurogenesis in Xenopus embryos regulated by a homologue of 

RT the Drosophila neurogenic gene Delta."; 

RL NATURE 375:761-766(1995). 

DR EMBL; L42229; G807696; -. 

DR PROSITE; PS01186; EGFJ; 8, 

DR PROSITE; PS01187; EGF_CA; 2. 

DR PFAM; PF00008; EGF; 6. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 721 AA; 79922 MW; 028040EF CRC32; 



Query Match 



26.6%; Score 322; DB 13; Length 721; 



Best Local Similarity 46.5%; Pred. No. 2.37e-39; 

Matches 46; Conservative 17; Mismatches 30; Indels 6; Gaps 4 

Db 431 CQCQEGFSGRNCDDNLDDCT SFPCQNGGTCQDG I NDYSCTCPPGY IGKNCSMP - - 1 - T - K 486 

hi h:l II :l III lllh I I :| III |: II I I :| : : I 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 487 --CEHNPCHNGATCHERNNRYVCQCARGYGGNNCQFLLP 523 

II hill I :: II Mil Ml :|: II: 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



RESULT 7 

ID Q25059 PRELIMINARY; PRT; 406 AA. 
AC Q25059; 

01-NOV-1996 (TREMBLREL. 01, CREATED) 
01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 
FIBROPELLIN III (FRAGMENT) . 
HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN) , 

EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 
ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS. 
[1] 

SEQUENCE FROM N.A. 
BISGROVE B.W.; 

SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS, 
EMBL; L33862; G499688; -, 
PROSITE; PS00577; AVIDIN; 1. 
PROSITE; PS01186; EGFJ; 6. 
PROSITE; PS01187; EGF CA; 5. 
PFAM; PF00008; EGF; 7. 
GLYCOPROTEIN; EGF-LIKE DOMAIN. 
NONJTER 1 1 

406 AA; 43475 MW; 45E6EE2C CRC32; 



Query Match 25.5%; Score 309; DB 5; Length 406; 

Best Local Similarity 44,2%; Pred. No, 6.34e-37; 

Matches 42; Conservative 19; Mismatches 28; Indels 6; Gaps 4; 

Db 74 CNCIPGFDGDNCENNINECASNPCQNGGVCIDGVNGFVCTCQPGYTGTLCET- -DI-D- • 128 

hl:||: till :| ::| : III: |:| ||:: | | ||:| I : 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 129 E-CASNPCQNGGVCTDLVNMYTCDCLAGFTGSNCE 162 

I : lllh I I I |:||:|| |::|| 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



ID Q90656 PRELIMINARY; PRT; 728 AA, 

AC Q90656; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE), 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE TRANSMEMBRANE PROTEIN C'DELTA-1, 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-SPINAL CORD; 

RX MEDLINE; 95319507. ■ 

RA HENRIQUE D., ADAM J., MYAT A., CHITNIS A., LEWIS J., ISH-HOROWICZ D.; 

RT "Expression of a Delta homologue in prospective neurons in the 

RT. chick."; 

RL NATURE 375:787-790(1995). 

DR EMBL; U26590; G882412; -. 

DR PROSITE; PS01186; EGFJ; 8. 

DR PROSITE; PS01187; EGF CA; 2. 

DR PFAM; PF00008; EGF; 6, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

728 AA; 79861 MW; 7439F575 CRC32; 
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Query Match 25.14; 
Best Local Similarity 43.4*; 
Matches 43; Conservative 



Score 304; DB 13; Length 728; 

Pred. No. 5,40e-36; 

21; Mismatches 29; Indels f 



436 CQCQAGFTGRHCDDNVDDCASFPCVNGGTCQDGVNDYSCTCPPGYNGKNCSTP--V-S-R 491 

1:1 :|::| H :l III I II: | | || II |: ||:| | | ;;: 
4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

492 "CEHNPCHNGATCHERSNRYVCECARGYGGLNCQFLLP 528 

II Ml | :::| I: hi :|: ||: 
64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLS 102 



ID Q90285 PRELIMINARY; PRT; 1476 AA. 

AC Q90285; Q98847; • 

DT 01-FEB-1997 (TREMBLREL. 02, CREATED) 

DT 01-FEB-1997 (TREMBLREL. 02, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE PUTATIVE EXTRACELLULAR AND CYTOPLASMIC FRAGMENT OF NOTCH -3 HOMOLOG 

•(FRAGMENT). 
CARASSIUS AURATUS (GOLDFISH) . 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI ; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; CYPRININAE; CARASSIUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-RETINA (10D POST-OPTIC NERVE CRUSH); 

RA SULLIVAN S.A., BARTHEL L.K., LARGENT B.L., RAYMOND P. A.; 

RL SUBMITTED (APR-1994) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U09191; G1617281; •. 

DR PROSITE; PS01186; EGFJ; 9. 

DR PFAM; PF00008; EGF; 11. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3, 

KW ANK REPEAT; GLYCOPROTEIN. 

FT NONJER 1 1 

FT NONJER 1476 1476 

SO SEQUENCE 1476 AA; 160385 MW; D6077129 CRC32; 

Query Match 25.0%; Score 303; DB 13; Length 1476; 

Best Local Similarity 44.8%; Pred, No, 8.28e-36; 

Matches 43; Conservative 12; Mismatches 40; Indels 1; Gaps 1; 

Db 186 CDCMPGYEGDNCEREVNECQSHPCQNGGTCIDLVGHYIRSCPPGTLGVLCEINGDDCATP 245 

Mill IMI : ::|: I III: :| I hi II ::| 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHL-PAP 62 



* 



246 SWPRGMPKCQNNGTCVDRVGGYRCNCPPGFTGERCE 2 

I III : III: I I III I II 
63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 9 



Q25058 PRELIMINARY; PRT; 529 AA. 
Q25058; 

01-NOV-1996 (TREMBLREL. 01/ CREATED) 
01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
FIBROPELLIN IA (FRAGMENT) . 
HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN). 

EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS. . 

tl] 

SEQUENCE FROM N.A. 
BISGROVE B.W.; 

SUBMITTED (JUN- 19 95) TO EMBL/GENBANK/DDBJ DATA BANKS, 

EMBL; L33861; G499686; -. 

PROSITE; PS0O577; AVIDIN; 1. 

PROSITE; PS01186; EGFJ; 10. 

PROSITE; PS01187; EGF_CA; 7, 

PFAM; PF00008; EGF; 10, 

GLYCOPROTEIN; EGF -LIKE DOMAIN, 



1 1 
529 AA; 55543 MW; 



Query Match 24.8*; Score 301; DB 5; Length 529; 

Best Local Similarity 43.24; Pred. No. 1.95e-35; 

Matches 41; Conservative 19; Mismatches 29; Indels 6; Gaps 2; 

Db 273 CSCVQGFTGSDCETNINECASGPCQNGGTCVDGVNGFVCQCPPNYTGTYCEIS-LDA- 328 

I I: hh :| | ::| ||||: hi lh: I h |:| III: I I 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 329 -CSSMPCQNGATCVNVGANYICECPPGFAGQNCE 361 

I : HIM II: I :|:| lh :|| 
Qy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 11 

ID P87357 PRELIMINARY; PRT; 717 AA. 

AC P87357; 

DT Ql-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE DELTAD TRANSMEMBRANE PROTEIN PRECURSOR, 

GN DELTAD. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97346722. 

RA DORNSEIFER P., TAKKE C, CAMPOS-ORTEGA J. A.; 

RT "Overexpression of a zebrafish homologue of the Drosophila neurogenic 

RT gene Delta perturbs differentiation of primary neurons and somite 

RT development."; 

RL MECH. DEV. 63:159-171(1997). 

DR EMBL; Y11760; E307461; -. 

DR PROSITE; PS01186; EGFJ; 8. 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PFAM; PF00008; EGF; 6. 

KW SIGNAL; TRANSMEMBRANE; GLYCOPROTEIN; EGF" LIKE DOMAIN. 

FT SIGNAL 1 19 POTENTIAL. 

FT CHAIN 20 717 DELTAD TRANSMEMBRANE PROTEIN. 

SQ SEQUENCE 717 AA; 79061 MW; 5CC32ECA CRC32; 

Query Match 24,6%; Score 298; DB 13; Length 717 ; 

Best Local Similarity 40.6*; Pred. No, 7. 00e- 35; 

Matches 41; Conservative 22; Mismatches 32; Indels 6; Gaps 5; 

Db 427 CQCPEGFTGTHCEDNIDECATYPCQNGGTCQDGLSDYTCTCPPGYTGKNC-TSA-V-N-K 482 

1:1 l::l :l :.l hi Mil: I I |:| I: I :: : I 
Qy 4 CECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPK 63 

Db 483 - - CLHNPCHNGATCHEMDNRYVCAC I PG YGGRNCQFLLPEN 521 

I Ml I : II II M M : ! I M ||: 
Oy 64 SPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECEKLLSVN 104 



ID 


088278 PRELIMINARY; 


PRT; 3313 AA. 




AC 


088278; 






DT 


01-NOV-1998 (TREMBLREL. 08 


, CREATED) 




DT 


01-NOV-1998 (TREMBLREL. 08 


, LAST SEQUENCE UPDATE) 




DT 


01-NOV-1998 (TREMBLREL. 08 


, LAST ANNOTATION UPDATE) 




DE 


MEGF2, 






GN 


MEGF2, 






OS 


RATTUS NORVEGICUS (RAT). 






OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTH 


^RIA; RODENTIA; 


OC 


SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 




RN 


[1] 






RP 


SEQUENCE FROM N.A, 






RC 


STRAIN-SPRAGUE-DAWLEY; TISSUE=BRAIN; 




RX 


MEDLINE; 98360089. 
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RA NAKAYAMA M., NAKAJIMA D., NAGASE T, , NOMURA N., SEKI N,, OHARA 0.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF- like motifs by motif -trap screening."; 

RL GENOMICS 51:27-34(1998) . 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN (BY SIMILARITY). 

DR EMBL; AB011528; D1033422; ■. 

DR PROSITE; PS00232; CADHERIN; 7. 

KW CELL ADHESION; GLYCOPROTEIN; TRANSMEMBRANE; CALCIUM-BINDING; REPEAT. 

SQ SEQUENCE 3313 AA; 359348 MW; 5D724643 CRC32; 

Query Match 24.5%; Score 297; DB 11; Length 3313; 

Best Local Similarity 32.7%; Pred. No. 1.07e-34; 

Matches 54; Conservative 39; ■ Mismatches 59; Indels 13; Gaps 9;' 

Db 1411 RCRCPPGFTGDFCETELDLCYSNPCRNGGACARREGGYTCVCRPRFTGEDCELDTE--AG 1468 




II I 1 1 ■ ■ 1 1 I • I i ■ I ■ 1 1 • I \\: I 

3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

1469 R- -CVPGVCRNGGTCTNAPNGGFRCQCPAGGAFEGPRCE-VAARSF-PPSSFVMFRGLRQ 1524 



: I Mi: I ■ I 1 1 1 - 1 I 1 1 1 1 • • :i ::: I i: 
Qy 63 KSPCEGTECQNGANCVDQGNRPV-CQCLPG--FGGPECEKLLSVNFVDRDTYLQFTDLQN 119 

Db 1525 RFHLTLSLSFATVQPSGLLFYNGRLNEKHDFLALELVAGQVRLTY 1569 

: : ::l I : :|:|:||| h I :|: I |:||::| 
Qy 120 WXRXNITLQVFTAEDNGI LLYNGG - ND • H - - I AVXLYXGHVRFSY 160 



RESULT 13 

ID Q06008 PRELIMINARY; PRT; 1203 AA. 

AC Q06008; 

DT Ql-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH PROTEIN HOMOLOG 2 (MOTCH B PROTEIN) (FRAGMENT). 

GN NOTCH2 OR MOTCH B. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-F1 (CBA X C57BL); TISSUE-WHOLE EMBRYO; 

RX MEDLINE; 93178563. 

RA LARDELLI M., LENDAHL U.; 

RT "Motch A and motch B-two mouse Notch homologues coexpressed in a 

twide variety of tissues,"; 
EXP. CELL RES, 204:364-372(1993). 
EMBL; X68279; G287990; 
MGD; MGI: 97364; NOTCH2, 

DR PFAM; PF00008; EGF; 27. 

DR PFAM; PF00066; notch; 1. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT . 

FT NONJER 1 1 

FT NONJER 1203 1203 

SQ SEQUENCE 1203 AA; 128982 MW; A5A95551 CRC32; 

Query Match 24.3%; Score 294; DB 11; Length 1203; 

Best Local Similarity 44.8%; Pred. No. 3.85e-34; 

Matches 43; Conservative 17; Mismatches 31; Indels 5; Gaps 4; 

Db 855 RCECVPGYQGVNCEYEVDECQNQPCQNGGTCIDLVNHFKCSCPPGTRGLLCE-ENI-D- 910 

lllhll! I II : hi::: lllh hi II : I h I I III 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 911 ECA-GGPHCLNGGQCVDRIGGYTCRCLPGFAGERCE 945 

Mill: III: I : I i I ] I : I 
Qy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



RESULT 14 

ID Q92566 PRELIMINARY; PRT; 2408 AA. 
AC 092566; 

DT 01-FEB-1997 (TREMBLREL. 02, CREATED) 

DT 01-FEB-1997 (TREMBLREL. 02, LAST SEQUENCE UPDATE) 



DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MYELOBLAST KIAA0279 (FRAGMENT) . 

GN KIAA0279. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BRAIN; 

RX MEDLINE; 97191544. 

RA NAGASE T., SEKI N., ISHIKAWA K. , OHIRA M., KAWARABAYASI Y., OHARA O., 

RA TANAKA A., KOTANI H., MIYAJIMA N,, NOMURA N,; 

RT "Prediction of the coding sequences of unidentified human genes. VI. 

RT The coding sequences of 80 new genes (KIAA0201-KIAA0280) deduced by 

RT analysis of cDNA clones from cell line KG-1 and brain."; 

RL DNA RES. 3:321-329(1996). 

DR EMBL; D87469; D1014097; -. 

DR PROSITE; PS01186; EGF J; 4. 

DR PFAM; PF00008; EGF; 6. 

DR PFAM; PF00028; cadherin; 5. 

DR PFAM; PF00054; laminin.G; 1. 

•KW GLYCOPROTEIN. 

FT NONJER 1 1 

SQ SEQUENCE 2408 AA; 261740 MW; CDBA2001 CRC32; 

Query Match 24.3%; Score 295; DB 4; Length 2408; 

Best Local Similarity 30.8%; Pred. No. 2.52e-34; 

Matches 44; Conservative 40; Mismatches 49; Indels 10; Gaps 9; 

Db 758 RCRCPPGFTGDYCETEVDLCYSRPCGPHGRCRSREGGYTCLCRDGYTGEHCEVSAR-SG- 815 

II I M::|| I : I I : I ::| :|:||| :||:|: II:::: :: 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 816 R--CT PGVCKNGGTCVNLLVGGFKCDCPSGDFEKPYCQ - VTTRSFPAH * SFITFRGLRQR 871 

: I III: II: hi :| I I h : : :| : ::: I h : 
Qy 63 KSPCEGTECQNGANCVDQGNRPV-CQCLPG-FGGPECEKLLSVNFVDRDTYLQFTDLQNW 120 

Db 872 FHFTLALS ■ FATKERDGLLLYNG 893 

: ::| |:: : :|:||||| 
Qy 121 XRXNITLQVFTAED-NGILLYNG 142 



RESULT 15 

ID 035516 PRELIMINARY; PRT; 2470 AA. 

AC 035516; 

DT 0WAN-1998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE CELL SURFACE PROTEIN. 

GN NOTCH2. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57B/6; TISSUE-THYMUS; 

RX MEDLINE; 93178563. 

RA LARDELLI M., LENDAHL U.; 

RT "Motch A and motch B--two mouse Notch homologues coexpressed in a 

RT wide variety of tissues."; 

RL EXP. CELL RES. 204:364-372(1993), 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57B/6; TISSUE-THYMUS; 

RA HAMADA Y., HIGUCHI M., TSUJIMOTO Y.; 

RL SUBMITTED (JUL-1994) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; D32210; D1022953; -. 

DR PROSITE; PS0001G; ASXJYDROXYL; 22. 

DR PROSITE; PS01186; EGF_2 ; 27. 

DR PROSITE; PS01187; EGF.CA; 22. 

DR PFAM; PF00008; EGF; 34. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 2, 
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KW GLYCOPROTEIN; EGP-LIKE DOMAIN. 

SQ SEQUENCE 2470 AA; 265325 MW; CA94E03A CRC32; 

Query Match 24.34; Score 294; DB 11; Length 2470; 

Best Local Similarity 44.8*; Pred. No. 3,85e-34; 

Matches 43; Conservative 17; Mismatches 31; Indels 5; Gaps 4; 

Db 1170 RCECVPGYQGVNCEYEVDECQNQPCQNGGTCIDLVNHFKCSCPPGTRGLLCE-ENI-D- 1225 

111:111 I II : |:|::: Mil: |:| II : | |: | | II :: 
Qy 3 RCECMPGYAGDNCSENQDDCRDHRCQNGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAP 62 

Db 1226 ECA-GGPHCLNGGQCVDRIGGYTCRCLPGFAGERCE 1260 

: I I II: III: |:|lll|:| II 
Qy 63 KSPCEGTECQNGANCVDQGNRPVCQCLPGFGGPECE 98 



Search completed; Fri May 28 08:56:06 1999 
Job time : 55 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^srch_pp protein • protein database search, using Smith -Waterman algorithm 

Run on: Fri May 28 08:58:00 1999; MasPar time 6.52 Seconds 

335,829 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description: 
Perfect Score: 



Scoring table: 



MJS-09-191-647-6 

(1-103) from US09191647 .pep 

813 

1 OCHISDQGEPYCLCQPGFSG GSSFVEEVERHLECGCLACS 103 

PAM 150 
Gap 11 

170751 seqs, 21266608 residues 



Post-processing: Minimum Match 04 

Listing first 45 summaries 

Database: a-geneseq35 

1: parti 2:part2 3:part3 4:part4 5:part5 6: parte 7:part7 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13:partl3 
14:partl4 15:partl5 16:partl6 17:partl7 18:partl8 
19:partl9 20:part20 21:part21 22:part22 23:part23 
24:part24 25:part25 26;part26 27:part27 28:part28 
29:part29 30:part30 31:part31 32:part32 33:part33 
34:part34 35:part35 36:part36 37:part37 38:part38 
39:part39 

itistics: Mean 27,967; Variance 116.252; scale 0.241 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



• 



Resuli 
No 



Query 



SUMMARIES 



Match Length 


DB 


ID 


Description 


Pred. 


51.0 


1534 


30 


W46966 


Amino acid sequence o 


5.16e 


17.1 


1480 


5 


R25079 


Drosophila SLIT prote 


7.60e 


16.2 


1872 


36 


W68510 


Partial human Notch- 3 


3.08e 


16.2 


2321 


36 


W49698 


Human Notch3 protein. 


3.08e 


14.9 


379 


5 


R25565 


Beta-IG-Ml. 


2,68e 


14.5 


1055 


29 


W44298 


Human serrate 2 prote 


4.81e 


14.5 


1212 


29 


W44299 


Human serrate 2. 


4.81e 


14.5 


1257 


19 


W05834 


Human Serrate-2 (HJ2) 


4.81e 


14.3 


381 


26 


W35730 


Human cysteine rich p 


7.09e 


14.3 


381 


26 


W35957 


Human monocyte mature 


7.09e 


13.3 


179 


37 


W75100 


Human secreted protei 


3,28e 


13.2 


375 


16 


R90919 


Connective tissue gro 


3.97e 


12.9 


193 


3 


P60463 


Sequence of C-terminu 


5.79e 


12.9 


2050 


38 


W73499 


Von Willebrand factor 


5.79e 


12.9 


2813 


3 


P60462 


Sequence of human von 


5.79e 


12.7 


612 


28 


W39256 


Human partial mature 


8,43e 



17 


103 


12,7 


737 28 


W39257 


Human memorane protei 


i . 4 3e- 01 


18 


102 


12.5 


2813 3 


P60053 


Sequence of von Wille 


l n9p+nn 


19 


101 


12.4 


685 37 


W80813 




X « « JC'UU 


20 


100 


12.3 


157 21 


W11730 


H-Delta-1 polypeptide 


1.48e+00 


21 


100 


12.3 


520 25 


W18348 


Proliferation and dif 


1 ifte+fifl 
1 , H OcTUU 


22 


100 


12.3 


660 21 


W11725 


a ueiiQ i puiypcpiiuc 




23 


100 


12.3 


702 25 


W18349 


Proliferation and dif 


x . i oc T uu 


24 


100 


12.3 


723 25 


W18353 


Proliferation and dif 


1 . 48e+00 


25 


99 


12.2 


727 21 


W11719 


I. UclLa X pUiypcptlUC 


1 7flo+nn 


26 


99 


12.2 


740 21 


W00876 


C -Delta "1 polypeptide 


1.78e+00 


27 


99 


12.2 


1036 25 


W18351 


Proliferation and dif 


l.78e+00 


28 


99 


12.2 


1187 25 


W18352 


Proliferation and dif 




29 


99 


12.2 


1208 28 


W40827 


Human Jagged protein , 




30 


99 


12.2 


1218 19 


W05833 


Human Serrate*l (HJ1) 


1 7R.P+00 


31 


99 


12.2 


1218 29 


W44301 


Human serrate 1 , 


1.78e+00 


32 


99 


12.2 


1218 25 


W18354 


Proliferation and dif 


1.78e+00 


33 


98 


12.1 


32 20 


W12907 


■ Modified ligand speci 


2.14e+00 


34 


98 


12.1 


374 27 


W37497 


Human TMP-2. 


2.14e+00 


35 


98 


12,1 


374 22 


W07663 


Human transforming gr 


2.14e+00 


36 


98 


12,1 


1193 19 


W05835 


Chick Serrate. 


2.14e+00 


37 


95 


11.7 


385 10 


R56167 


Neuroendocrine tumor 


3.72e+00 


38 


95 


11,7 


833 6 


R28960 


Delta Dll. 


3.72e+00 


39 


94 


11.6 


33 20 


W12909 


Modified ligand speci 


4.47e+00 


40 


94 


11,6 


208 16 


R92897 


Human HBEGF precursor 


4.47e+00 


41 


94 


11,6 


208 13 


R66190 


Diphtheria toxin (DT) 


4.47e+00 


42 


94 


11,6 


208 16 


R92898 


Monkey HBEGF precurso 


4.47e+00 


43 


94 


11.6 


208 16 


R80787 


Monkey precursor hepa 


4.47e+00 


44 


94 


11.6 


333 16 


R92900 


HBEGF Val Met saporin 


4.47e+00 


45 


94 


11.6 


334 16 


R92921 


Saporin -HBEGF protein 


4.47e+00 



RESULT 
ID 
AC 



1 



W46966 standard; Protein; 1534 AA. 
W46966; 

06- JUL-1998 (first entry) 

Amino acid sequence of a human slit-like polypeptide, 

Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

cancer; antibody. 

Homo sapiens. 

Key Location/Qualifiers 
Peptide 1..26 

/note- "signal peptide" 
Protein 27,. 1534 

/note- "mature protein" 

J10087699-A. 

07- APR-1998. 

15- JUL-1997; 205351. 

16- JUL-1996; JP-186219. 

(AS AH ) ASAHI KASEI KOGYO KK. 
WPI; 98-267127/24. 
N-PSDB; V16978. 

Human Slit-like protein - useful for diagnosis and treatment of 
brain-specific diseases and cancers 
Disclosure; Pages 31-35; 45pp; Japanese. 
The present sequence represents a novel human slit-like protein (the 
mature protein is claimed in Claim 1). The slit-like polypeptide is 
useful for diagnosis and treatment of brain-specific diseases and 
cancers. Antibodies directed against the protein, or its fragments 
can also be used for diagnosing cancer. 
Sequence 1534 AA; 



Query Match 51.0%; 
Best Local Similarity 49.0%; 
51; Conservative 



Score 415; DB 30; Length 1534; 
Pred. No. 5.16e-30; 

22; Mismatches 30; Indels 1; Gaps 1; 

Db 1431 hcqasgtkgahcvcdpgfsgelceqesecrgdpvrdfhqvqrgyaicqttrplswvecrg 1490 

:h I : hhlllll! |:||: I h II: : hill I |: :: :|||| 
Qy 1 QCHISDQGEPYCLCQPGFSGEHCQQENPCLGQWREVIRRQKGYASCATASKVPIMECRG 60 

Db 1491 scpgqgccqglrlkrrkftfecsdgtsfaeevekptkcgcalca 1534 
: I III I MM: |:|:||:|| ||||: I! h 
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Qy 61 GC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS 103 



RESULT 2 

ID R25079 standard; Protein; 1480 AA. 

AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss. 

OS Drosophila melanogaster. 

FH Key Location/Qualifiers 

FT peptide 1..36 

FT /label- signal 

FT domain 73,. 294 

FT /label- Flank_LRR_Flank_l 

FT /note- "mediates adhesive events" 

FT domain 295.. 518 

FT /label- Flank-LRR-Flank_2 

•/note- "mediates adhesive events" 
domain 519.. 714 

/label- Flank_LRR_Flank_3 
FT /note- "mediates adhesive events" 

FT domain 715.. 910 

FT /label- Flank_LRR_Flank_4 

FT /note- "mediates adhesive events" 

FT region 911.. 1150 

FT /label- Tandem_EGF_like_repeats 

FT /note- "involved in protein -protein interactions" 

FT region 1353.. 1393 

FT /label- 7th_EGF_like_repeat 

FT /note- "involved in receptor -ligand interactions" 

FT region 1394., 1404 

FT /label- alternative_splice_segment 

FT /note- "developmental^ regulated" 

FT region 1405,, 1480 

FT /label- C- terminal region 

PN W09210518-A. 

PD 25-JON-1992. 

PF 27 -NOV- 1991; D09055. 

PR 07-DEC-1990; US-624135. 

PA (OYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Rothberg JM; 

DR WPI; 92-234590/28. 

DR N-PSDB; Q25811. 

PT SLIT protein and sequence elements for treating 

PT neuro-degenerative disease - useful for Alzheimer's disease, 

« nerve damage and Parkinson's disease, for diagnosis of cancer 
Claim 1; Page 84-89; 122pp; English. 
The SLIT protein is necessary for normal development of the midline 
of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways, The process 

CC is dependent on the level of SLIT protein expression, It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham, The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart, The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding. SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes-caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 

CC claimed as are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 



Query Match 17.1%; Score 139; DB 5; Length 1480; 

Best Local Similarity 35.7%; Pred. No. 7.60e-04; 

Matches 15; Conservative 9; Mismatches 18; Indels 0; 

Db 1434 cvggcgnqccaakivrrrkvrmvcsnnrkyiknldivrkcgc 1475 

I llll III : :MI h: :: ::: III 
Qy 58 CRGGCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 3 

ID W68510 standard; Protein; 1872 AA. 

AC W68510; 

DT 06-JAN-1999 (first entry) 

DE Partial human Notch- 3 protein. 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy. 

OS Homo sapiens. 



FH 


Key 


Location/Qualifiers 


FT 


Miscjifference 


328 






FT 




/note- 


"encoded by 


NAN" 


FT 


Miscjifference 


401 




FT 




/note- 


"encoded by 


GNN" 


FT 


Miscjifference 


403 




FT 




/note- 


"encoded by 


GNC" 


FT 


Miscjifference 


406 




FT 




/note- 


"encoded by 


GNN" 


FT 


Miscjifference 


409 




FT 




/note- 


"encoded by 


NNT" 


FT 


Miscjifference 


420 




FT 




/note- 


•encoded by 


GNC" 


FT 


Miscjifference 


706 






FT 




/note- 


"encoded by 


NNN" 


FT 


Miscjifference 


708 






FT 




/note- 


"encoded by 


CCN" 


FT 


Miscjifference 


719 




FT 




/note- 


"encoded by 


CGN" 


FT 


Miscjifference 


728 




FT 




/note- 


"encoded by 


CNT" 


FT 


Miscjifference 


729 






FT 




/note- 


"encoded by 


GTN" 


FT 


Miscjifference 


759.. 789 




FT 




/note- 


"encoded by 


NNN" 


FT 


Miscjifference 


1425 




FT 




/note- 


"encoded by 


GNA" 


PN 


FR2751985-A1. 






PD 


06-FEB-1998. 









PF 01-AOG-1996; 009733. 

PR 01-AUG-1996; FR-009733. 

PA (INRM ) INSERM INST NAT SANTE S RECH MEDICALE. 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133137/13, 

DR N-PSDB; V57163. 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig la-lg; 42pp; French. 

CC This sequence represents a partial human notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of the 

CC cerebral autosomal dominant arteriopathy with subcortical infarcts and 

CC leukoencephalopathy (CADASIL) type. Blocking expression of a mutated 

CC Notch3 gene or by substitution therapy with non -mutated Notch3 gene or 

CC protein can be used to treat CADASIL or related disorders, 

SQ Sequence 1872 AA; 

Query Match 16.2%; Score 132; DB 36; Length 1872; 

Best Local Similarity 45,5%; Pred. No. 3,08e-03; 

Matches 15; Conservative 9; Mismatches 7; Indels 2; Gaps 2; 
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Db 994 qc - vdeds shycvcpegrtgshceqevdpclaq 1025 

II : Ihl I :| Ihll :|lhl 
Qy 1 QCHISDQGEPYCLCQPGFSGEHCQQE-NPCLGQ 32 



RESULT 4 

ID W49698 standard; Protein; 2321 AA. 

AC W49698; 

DT 21-DEC-1998 (first entry) 

DE Human Notch3 protein. 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy; therapy. 

OS Homo sapiens, 

PN FR2751986-A1. 
06-FEB-1998 . 

A 16-APR-1997; 004680. 

V Ol-ADG-1996; FR-009733. 

PA (INRM ) INSERM INST NAT SANTE S RECH HEDICALE. 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133138/13. 

DR N-PSDB; V57001. 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig 1.1-1.8; 45pp; French. 

CC This sequence represents the human Notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of 

CC the cerebral autosomal dominant arteriopathy with subcortical infarcts 

CC and leukoencephalopathy (CADASIL) type. Blocking expression of a 

CC mutated Notch3 gene or by substitution therapy with non -mutated Notch3 

CC gene or protein can be used to treat CADASIL or related disorders. 

SQ Sequence 2321 AA; 

Query Match 16.2%; Score 132; DB 36; Length 2321; 

Best Local Similarity 45.5%; Pred. No, 3,08e-Q3; 

Matches 15; Conservative 9; Mismatches 7; Indels 2; Gaps 2; 

Db 1060 qc-vdedsshycvcpegrtgshceqevdpclaq 1091 

II : Ihl 1 :| Ihll :|lhl 
Qy 1 QCHISDQGEPYCLCQPGFSGEHCQQE-NPCLGQ 32 

«ULT 5 
R25565 standard; Protein; 379 AA. 

AC R25565; 

DT 18-JAN-1993 (first entry) 

DE Beta-IG-Ml. 

KW Transforming growth factor beta; induced; CEF-10; v-src; chicken; 

KW embryo; fibroblasts; TGF-beta, 

OS Mus musculus. 

PN EP-495674-A. 

PD 22-JUL-1992. 

PF 17-JAN-1992; 300429. 

PR 18-JAN-1991; 05-642991. 

PR 10-JAN-1992; US-816270. 

PA (BRIM ) BRISTOL-MYERS SQUIBB CO. 

PI Brunner AM, Chinn J, Neubauer MG, Purchio AF; 

DR WPI; 92-243508/30. 

DR N-PSDB; Q26421, 

PT TGF-beta induced gene family ■ encodes proteins involved in 

PT growth and differentiation effects of TGF-beta-1 

PS Claim 2; Fig 1; 35pp; English. 

CC The protein sequence was deduced from the DNA sequence obtd. by 

CC screening a cDNA library made from AKR-2B mouse cells induced with 

CC TGF-betal and cyclohexamide with two probes from untreated AKR-2B 

CC mRNA and AKR-2B mRNA from cells treated with cyclohexamide and TGF- 

CC betal, The proteins encoded by hybridising colonies (beta-IG-Ml and 

CC beta-IG-M2) contain 38 Cys residues and are induced by TGF-betal. 

CC Beta-IG-Ml displays 80 percent. homology to the CEF-10 protein 



CC induced by v-src in chicken embryo fibroblasts and is identical 

CC to the protein encoded by cyr61, an immediate early response gene 

CC induced in quiescent BALB 3T3 cells by serum treatment. Residues 

CC 49-56 of beta-IG-Ml conform to the GCGCCXXC motif reported in the 

CC amino half of insulin-like growth factor (IGF) binding proteins. 

CC The C-terminal Cys rich region of beta-IG-Ml, -M2 and CEF-10 contain 

CC an amino acid sequence with strong homology to a motif found near the 

CC C-terminal of the malarial circumsporozoite (CS) protein, which is 

CC highly conserved among all species of malarial parasites sequenced 

CC to date (designated region II), This motif is also found in 

CC other proteins which have cell adhesive properties that mediate 

CC cell-cell and cell-extracellular matrix interactions, such as 

CC properdin, thrombospondin, and TRAP. The proteins encoded by 

CC TGF-beta induced genes are likely to be involved in mediation of 

CC the biological effects of TGF-beta relating to cell growth and 

CC differentiation. See also R25566. 

SQ Sequence 379 AA; 

Query Match 14.9%; Score 121; DB 5; Length 379; 

Best Local Similarity 33.3%; Pred. No. 2.68e-02; 

Matches 19; Conservative 11; Mismatches 25; Indels 2; Gaps 2; 

Db 298 yagcssvkkyrpkyc-gscvdgrcctplqtrtvkmrfrcedgemfsknvmmiqsckc 353 

1 1 : 1 : : I Ihl :|l I ::: I hi II I :| II 
Qy 44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 6 

ID W44298 standard; Protein; 1055 AA. 

AC W44298; 

DT 19-JUN-1998 (first entry) 

DE Human serrate 2 protein fragment. 

KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour. 

OS Homo sapiens. 

PN WO9802458-A1. 

PD 22-JAN-1998. 

PF ll-JUL-1997; J02414. 

PR 14-MAY-1997; JP-124063. 

PR 16-JUL-1996; JP-186220. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15181. 

PT Human serrate- 2 gene expression products ■ used to regulate stem 

PT cell differentiation, useful in treating neoplasms,, e.g. leukaemia 

PS Claim 2; Page 57-62; 103pp; Japanese. 

CC The present sequence represents a human serrate 2 protein fragment, The 

CC present invention also describes a method for the preparation of the 

CC polypeptides, and antibodies binding to the polypeptide and its 

CC fragments. The polypeptide and its fragments expressed by the serrate- 2 

CC gene can be used to inhibit stem (especially blood stem) cell 

CC differentiation and to inhibit endothelial cell growth. They may be 

CC incorporated in a cell culture media for culturing undifferentiated 

CC stem cells . They can also be used for treatment of neoplasms such as 

CC leukaemia. The antibodies can be used for the diagnosis of malignant 

CC tumours . 

SQ Sequence 1055 AA; 

Query Match 14.5%; Score 118; DB 29; Length 1055; 

Best Local Similarity 36.8%; Pred. No. 4.81e-02; 

Matches 14; Conservative 13; Mismatches 8; Indels 3; Gaps 3; 

Db 585 rc-vsqpggnfscicdsgftgtycheniddclgqpcrn 621 

:| =h I, : I : I : : 1 1 " I |::: : Mil h 
Qy 1 QCHISDQGEPY-CLCQPGFSGEHCQQE-NPCLGQWRE 36 



RESULT 7 

ID W44299 standard; Protein; 1212 AA. 
AC W44299; 

DT 19-JUN-1998 (first entry) 
DE Human serrate 2. 
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KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour. 

OS Homo sapiens. 

PN WO9802458-A1. 

PD 22-JAN-1998. 

PF U-JUl-1997; J02414, 

PR 14-MAY-1997; JP-124063, 

PR 16-JUL-1996; JP-186220. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15181. 

PT Human serrate-2 gene expression products - used to regulate stem 

PT cell differentiation, useful in treating neoplasms, e,g, leukaemia 

PS Claim 3; Page 62-68; 103pp; Japanese. 

CC The present sequence represents human serrate 2. The present invention 

CC also describes a method for the preparation of the polypeptides, and 

CC antibodies binding to the polypeptide and its fragments. The polypeptide 

CC and its fragments expressed by the serrate-2 gene can be used, to inhibit 

CC stem (especially blood stem) cell differentiation and to inhibit 

^ endothelial cell growth, They may be incorporated in a cell culture 

media for culturing undifferentiated stem cells, They can also be used 

mm for treatment of neoplasms such as leukaemia. The antibodies can be used 

CC for the diagnosis of malignant tumours. 

SQ Sequence 1212 AA; 

Query Match 14.5*; Score 118; DB 29; Length 1212; 

Best Local Similarity 36.8*; Pred. No. 4.81e-02; 

Matches 14; Conservative 13; Mismatches 8; Indels 3; Gaps 3; 

Db 585 rc-vsqpggnfscicdsgftgtycheniddclgqpcrn 621 

:| :h I : IMMH I::: : Ml |: 
Qy 1 QCHISDQGEPY - CLCQPGFSGEHCQQE - NPCLGQWRE 36 



ID 


W05834 standard; Protein; 1257 AA. 


AC 


W05834; 




DT 


28-JAN-1997 


(first entry) 


DE 


Human Serrate-2 (HJ2), 


KW 


Serrate-2; human jagged-2; HJ2; Notch; cell differentiation; . 


KW 


cell fate; central nervous system; cancer; tissue repair; therapy; 


KW 


diagnosis; antibody, 


OS 


Homo sapiens. 




FH 


Key 


Location/Qualifiers 


FT 


domain 


1. .912 


FT 




/label- Extracellular_domain 


FT 




/note- "a deletion in the encoding cDNA clone 


FT 

• 




results in loss of part of the Serrate-2 
signal peptide and beginning of the DSL 
domain 




domain 


26, ,70 


FT 




/label- DSL 


FT 




/note- "region of homology with Drosophila Delta 


FT 




and Serrate, predicted to mediate binding 


FT 




with Notch" 


FT 


domain 


75.. 735 


FT 




/label- ELR 


FT 




/note- "epidermal growth factor-like repeat domain 


FT 


region 


75,. 105 


FT 




/label- ELR1 


FT 


region 


106. .140 


FT 




/label- ELR2 


FT 


region 


141. .180 


FT 




/label- ELR3 


FT 


region 


181.. 218 


FT 




/label- ELR4 


FT 


region 


219. .256 


FT 




/label- ELR5 


FT 


region 


257, .294 


FT 




/label- ELR6 


FT 


region 


295. .331 


FT 

<v 




/label- ELR7 



FT region 332.. 369 

FT /label- ELR8 

FT region 370.. 407 

FT /label- ELR9 

FT region 408.. 435 

FT /label- PartialjLR 

FT region 436.. 469 

FT /label- PartialjLR 

FT region 470.. 507 

FT /label- ELR10 

FT region 508,. 545 

FT /label- ELR11 

FT region 54 6,. 584 

FT /label- ELR12 

FT region 585.. 622 

FT /label- ELR13 

FT region 623,, 660 

FT /label- ELR14 

FT region 664,. 701 

FT /label- ELR15 

FT region 702,. 718 

FT /label- Partial ELR 

FT region 719.. 735 

FT /label- PartialjLR 

FT domain 913., 933 

FT /label- Transmembrane.domain 

FT domain 934., 1257 

FT ' /label- intracellular domain 

PN WO9627610-A1. 

PD 12-SEP-1996. 

PF 07-MAR-1996; U03172. 

PR 07-MAR-1995; US-400159, 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) ONIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 

PI Lewis JH, Mann RS, Myat AM; 

DR WPI; 96-425379/42. 

DR N-PSDB; W05834, 

PT Vertebrate Serrate protein and related DNA - used to treat or 

PT prevent malignancies characterised by increased Notch activity, 

PS Claim 5; Page 104-107; 161pp; English. 

CC Human Serrate-1 (W05833) and human Serrate-2 (W05833) are ligands 

CC for the zygotic neurogenic locus Notch, and are believed to play a 

CC major role in determining cell fates (differentiation) in the 

CC central nervous system, Their amino acid sequences were deduced 

CC from cDNA clones (see also T40090-91) isolated from human foetal 

CC brain cDNA libraries. The proteins, antibodies raised to them, 

CC and encoding nucleic acids can be used in the detection of 

CC Serrate sequences and in the treatment of disorders of cell fate 

CC or differentiation, partic, cancer, nervous system disorders 

CC and in tissue repair or regeneration. 

SQ Sequence 1257 AA; 

Query Match 14.5%; Score 118; DB 19; Length 1257; 

Best Local Similarity 36.8%; Pred, No. 4.81e-02; 

Matches 14; Conservative 13; Mismatches 8; Indels 3; Gaps 3; 

Db 441 rc-vsqpggnfscicdsgftgtycheniddclgqpcrn 477 

:| M I : MMM I::: : Ml |: 

Qy 1 QCHISDQGEPY-CLCQPGFSGEHCQQE-NPCLGQWRE 36 



RESULT 9 

ID W35730 standard; Protein; 381 AA. 

AC W35730; 

DT 27-MAR-1998 (first entry) 

DE Human cysteine rich protein 61 (Cyr61). 

KW Cysteine rich protein 61; Cyr61; human; 

KW extracellular matrix signalling molecule; cell adhesion; 

KW cell migration; cell proliferation; angiogenesis; chondrogenesis; 

KW oncogenesis; haematostasis; wound healing; organ regeneration. 

OS Homo sapiens. 

PN W09733995-A2. 
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PD 18-SEP-1997. 

PF 14-MAR-1997; U04193. 

PR 15-MAR-1996; US-013958. 

PA (MUNI-) MUNIN CORP. 

PI Lau LF; 

DR WPI; 97-470875/43. 

DR N-PSDB; T94699. 

PT isolated and purified cysteine rich protein 61/ Cyr61 - useful to 

PT modulate e.g. haematostasis, induce wound healing, promote organ 

PT regeneration etc 

PS Claim 2; Page 112-113; 133pp; English. 

CC This protein sequence comprises human cysteine rich protein 61 

CC (Cyr61), an extracellular matrix signalling molecule. Its amino 

CC acid sequence was deduced from a human placental cDNA clone (see 

CC T94699). Cyr61 polypeptides can be expressed in transformed or 

CC transfected host cells. Cyr61 can be used to modulate 

haematostasis, induce wound healing in a tissue, promote organ 
regeneration, improve tissue grafting or promote bone or prothesis 
implantation (claimed) . It can also be used to screen for a 

Tc modulator of angiogenesis, chondrogenesis, oncogenesis, cell 

CC adhesion, cell migration, cell proliferation, expand a population 

CC of undifferentiated haematopoietic stem cells in culture and to 

CC screen for a mitogen (claimed). Ex vivo methods for using 

CC mammalian extracellular matrix signalling molecules to prepare 

CC blood products are also provided. 

SQ Sequence 381 AA; 

Query Match 14.3*; Score 116; DB 26; Length 381; 

Best Local Similarity 33,3%; Pred. No. 7.09e-02; 

Matches 19; Conservative 10; Mismatches 26; Indels 2; Gaps 2; 

Db 300 yagclsvkkyrpkyc-gscvdgrcctpqltrtvkmrfrcedgetfsknvmmiqsckc 355 

11:1 : I : I 1:1 :ll I :: I |:| II :| :| II 
Qy 44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



f 



RESULT 10 

ID W35957 standard; Protein; 381 AA, 
AC W35957; 

DT 05-MAR-1998 (first entry) 

DE Human monocyte mature differentiation factor, 

KW Human; monocyte; mature; differentiation factor; MMDF; macrophage; 

KW cancer; immune activator; tissue culture; infectious disease. 

OS Homo sapiens. 

«J09234079-A. 
09-SEP-1997. 
04-MAR-1996; 075236. 
04-MAR-1996; JP-075236. 
PA (TOYM ) IOYOBO KK. 
WPI; 97-497320/46. 
<-PSDB; T97142, 

A monocyte mature differentiation factor - useful for the long term 
ri tissue culture of macrophage(s) 
PS Claim 9; Page 12-13; 22pp; Japanese, 
CC The present sequence represents a monocyte mature differentiation 
CC factor (MMDF) which maintains the life of macrophages for long periods 
CC in liquid culture. MMDF can be used as an anti-cancer agent, an immune 
CC activator and to treat infectious diseases, 
SQ Sequence 381 AA; 



DR 



PT 



Query Match 14.3*; Score 116; DB 26; Length 381; 

Best Local Similarity 33,3%; Pred. No. 7.09e-02; 

Matches 19; Conservative 10; Mismatches 26; Indels 2; Gaps 

Db 300 yagclsvkkyrpkyc-gscvdgrcctpqltrtvkmrfrcedgetfsknvmmiqsckc 355 

11:1 : I = I 1:1 :|| I :: I |:| It :| :| I | 
Qy 44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 11 

ID W75100 standard; Protein; 179 AA. 
AC W75100; 

DT 28-JAN-1999 (first entry) 



Human secreted protein encoded by gene 44 clone HE8CJ26. 
Human; secreted protein; fusion protein; gene therapy; protein therapy; 
diagnosis; tissue; cancer; tumour; neurodegenerative disorder; leukaemia; 
developmental abnormality; foetal deficiency; blood; allergy; renal; 
immune system; asthma; lymphocytic disease; brain; hepatic; lymphoma; 
inflammation; ischaemic shock; Alzheimer's disease; restenosis; AIDS; 
cognitive disorder; schizophrenia; prostate; obesity; osteoclast; thymus; 
osteoporosis; arthritis; testis; lung; thyroiditis; thyroid; digestion; 
endocrine; metabolism; regulation; malabsorption; gastritis; neoplasm. 
Homo sapiens. 

Key Location/Qualifiers 
Misc-difference 179 

/label= unknown 

W09839446-A2. 



PD 


11 


■SEP 


1998. 




PF 


06 


-MAR 


1998 


U04492. 


PR 


07 


-MAR 


1997 




038621. 


PR 


07 


MAR 


1997 


nc 


040161. 


PR 


07 


■MAR 


1997 


US 




PR 


07 


■MAR 


1997 


US 


040163. 


PR 


07 


MAR 


1997 


US 


040333, 


PR 


07 


MAR 


1997 


US 


040334. 


PR 


07 


MAR 


1997 


US 


040336. 


PR 


07 


MAR 


1997 


us 




PR 


11 


APR 


1997 


us 


043311. 


PR 


11 


APR 


1997 


us 


043312, 


PR 


11 


APR 


1997 


us 


043313, 


PR 


11 


APR 


1997 


us 


043314. 


PR 


11 


APR 


1997 


us 


043315, 


PR 


11 


APR 


1997 


us 


043568. 


PR 


11 


APR 


1997 


us 


043569. 


PR 


11 


APR 


1997 


us 


043576. 


PR 


11 


APR 


1997 


us 


043578. 


PR 


11 


APR 


1997 


us 


043580. 


PR 


11 


APR 


1997 


us 


043669. 


PR 


11 


APR 


1997 


us 


043670. 


PR 


11 


APR 


1997 


us 


043671. 


PR 


11 


APR 


1997 


us 


043672, 


PR 


11 


APR 


1997 


us 


043674. 


PR 


23 


MAY 


1997 


us 


047492, 


PR 


23 


MAY 


1997 


us 


047500, 


PR 


23 


MAY 


1997 


us 


047501, 


PR 


23 


MAY 


1997 


us 


047502, 


PR 


23 


MAY 


1997 


us 


047503, 


PR 


23 


MAY 


1997 


us 


047581, 


PR 


23 


MAY 


1997 


us 


047582, 


PR 


23 


MAY 


1997 


us 


047583, 


PR 


23 


MAY 


1997 


us 


047584. 


PR 


23 


MAY 


1997 


us 


047585. 


PR 


23 


MAY 


1997 


us 


047586, 


PR 


23 


MAY 


1997 


us 


047587. 


PR 


23 


MAY 


1997 


us 


047588. 


PR 


23 


MAY 


1997 


us 


047589. 


PR 


23 


MAY 


1997 


us 


047590. 


PR 


23 


MAY 


1997 


us 


047592. 


PR 


23 


MAY 


1997 


us 


047593. 


PR 


23 


MAY 


1997 


us 


047594. 


PR 


23 


MAY 


1997 


us 


047595, 


PR 


23 


MAY 


1997 


us 


047596. 


PR 


23 


MAY 


1997 


us 


047597. 


PR 


23 


MAY 


1997 


us 


047598. 


PR 


23 


MAY 


1997 


us 


047599. 


PR 


23 


MAY 


1997 


us 


047600. 


PR 


23 


MAY 


1997 


us 


047601. 


PR 


23 


MAY 


1997 


us 


047612. 


PR 


23 


MAY 


1997 


us 


047613. 


PR 


23 


MAY 


1997 


us 


047614, 


PR 


23 


MAY 


1997 


us 


047615. 


PR 


23 


MAY 


1997 


us 


047617. 


PR 


23 


MAY 


1997 


us 


047618, 


PR 


23 


MAY 


1997 


us 


047632. 


PR 


23 


MAY 


1997 


us 


047633. 


PR 


06 


JUN-1997 


us 


048964. 
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PR 


06 


JON 


1997 


us 


048974, 


PR 


22 


AUG 


1997 


US 


056630. 


PR 


22 


AUG 


1997 


us 


056631. 


PR 


22 


AUG 


1997 


us 


056632. 


PR 


22 


AUG 


1997 


us 


056636. 


PR 


22 


AUG 


1997 


us 


056637, 


PR 


22 


AUG 


1997 


us 


056662, 


PR 


22 


AUG 


1997 


us 


056664, 


PR 


22 


AUG 


1997 


us 


056845, 


PR 


22 


AUG 


1997 


us 


056862. 


PR 


22 


AUG 


1997 


us 


056864. 


PR 


22 


AUG 


1997 


us 


056872. 


PR 


22 


AUG 


1997 


us 


056874, 


PR 


22 


AUG 


1997 


us 


056875, 


PR 


22 


AUG 


1997 


us 


056876, 


PR 


22 


AUG 


1997 


us 


056877. 


PR 


22 


AUG 


1997 


us 


056878, 


PR 


22 


AUG 


1997 


us 


056879, 


PR 


22 


AUG 


1997 


OS 


056880, 


PR 


22 


AUG 


1997 


us 


056881. 


Efi 


22 


AUG 


1997 


us 


056882. 


m 


22 


AUG 


1997 


us 


056884. 




22 


AUG 


1997 


us 


056886. 


W 


22 


AUG 


1997 


us 


056887. 


PR 


22 


AUG 


1997 


us 


056888. 


PR 


22 


AUG 


1997 


us 


056889, 


PR 


22 


AUG 


1997 


us 


056892! 


PR 


22 


AUG 


1997 


us 


056893. 


PR 


22 


AUG 


1997 


us 


056894. 


PR 


22 


AUG 


1997 


us 


056903. 


PR 


22 


AUG 


1997 


us 


056908. 


PR 


22 


AUG 


1997 


us 


056909. 


PR 


22 


AUG 


1997 


us 


056910. 


PR 


22 


AUG 


1997 


us 


056911. 


PR 


05 


SEP 


1997 


us 


057650. 


PR 


05 


SEP 


1997 


us 


057761, 



(HUMA-) HUMAN GENOME SCI INC, 

Bednarik DP, Brewer LA, Carter KC, Duan R, Ebner R, Endress GA, 
Feng P, Ferrie AM, Fischer CL, Graves KA, Greene JM, Hu JS, 
Kyaw H, Lafleur DW, Li Y, Moore PA, Hi J, Olsen HS, Rosen CA, 
Ruben SM, Shi Y, Soppet DR, Young PE, Yu GL, Zeng Z; 
WPI; 98-609887/51. 
N-PSDB; V34197, 

New isolated human genes and the secreted polypeptides they encode 
■ useful for diagnosis and treatment of e.g. cancers, neurological 
disorders, immune diseases, inflammation or blood disorders 
Claim 1; Page 305; 447pp; English, 

This sequence represents a secreted human protein encoded by the gene 
clone detailed in the descriptor line. 

The gene can be used to generate fusion proteins by linking to the gene 
to a human immunoglobulin Fc portion (e,g. V34145) for increasing the 
stability of the fused protein as compared to the human protein only. 
The invention relates to 70 novel genes and their fragments (nucleic acid 
sequences: V34154 -V34276; amino acid sequences W75057-W75179) which 
are useful for preventing, treating or ameliorating medical conditions 
e.g. by protein or gene therapy. Also, pathological conditions can be 
diagnosed by determining the amount of the new polypeptides in a sample 
or by determining the presence of mutations in the new polynucleotides, 
Specific uses are described for each of the 70 polynucleotides, based on 
which tissues they are most highly expressed in (see V34154 for described 
uses). 

Sequence 179 AA; 

Query Match 13.3%; Score 108; DB 37; Length 179; 

Best Local Similarity 40.6%; Pred. No. 3.28e-01; 

Matches 13; Conservative 8; Mismatches 9; Indels 2; Gaps 2; 



Db 



121 chmlsr-dtyectcqvgftgkecqwtdaclsh 151 
II: : : I I II Ihl II ::||:: 
Qy 2 CHISDQGEPY-CLCQPGFSGEHCQQENPCLGQ 32 



ID R90919 standard; Protein; 375 AA. 

AC R90919; 

DT 25-JUN-1996 (first entry) 

DE Connective tissue growth factor-2. 

KW CTGF-2; connective tissue growth factor-2; secreted protein; 

KW cartilagenous growth; skeletal; embryo; cell growth; morphogenesis; 

KW insulin-like growth factor; fibroblast growth factor; Cry61. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT peptide 1..24 

FT /label- signalj>eptide 

FT protein 25,, 375 

FT /label- mature_protein 

PN WO9601896-A, 

PD 25-JAN-1996, 

PP' 12-JUL-1994; U07736. 

PR 12-JUL-1994; WO-U07736. 

PA (HUMA-) HUMAN GENOME SCI INC. 

PI Adams MD, Li H; 

DR WPI; 96-097626/10. 

DR N-PSDB; T12653. 

PT Connective tissue growth factor-2 and DNA encoding it - useful to 

pt enhance the repair of connective and support tissue, and to enhance 

PT wound healing 

PS Claim 1; Fig 1A-C; 46pp; English. 

CC Connective tissue growth factor-2 (CTGF-2) is encoded by a cDNA 

CC (T12653) isolated from a human foetal lung cDNA library. The GTGF 

CC polypeptides are structurally and functionally related to a family 

CC of growth factors which include IGF (Insulin-like growth factor), 

CC PDGF (platelet-derived growth factor), and FGF (fibroblast growth 

CC factor). CTGF-2 exhibits 89 percent identity and 93 percent similarity 

CC to Cry61, Cry61 is a growth factor- inducible immediate early gene 

CC initially identified in serum- stimulated mouse fibroblasts. It encodes 

CC a member of an emerging family of secreted proteins which are also a 

CC group of cysteine-rich proteins. This group of GFs are important for 

CC normal growth, differentiation, morphogenesis of the cartilaginous 

CC skeleton of an embryo and cell growth. 

SQ Sequence 375 AA; 

Query Match 13.2%; Score 107; DB 16; Length 375; 

Best Local Similarity 34,7%; Pred. No. 3, 97e- 01; 

Matches 17; Conservative 9; Mismatches 21; Indels 2; Gaps 2; 

Db 301 yagclsvkkyrpkyc-gscvdgrcctpqltrtvkmrfpcedgetfsknv 348 

Ihl : I : I 1:1 :|| I :: I II II :| :| 
Qy 44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEV 91 



RESULT 13 

ID P60463 standard; Protein; 193 AA. 

AC P60463; 

DT 25-JUN-1991 (first entry) 

DE Sequence of C-terminus of von Willebrand Factor (VWF). 

KW Chronic renal failure; therapy; factor VIII C. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT modified_site 15 

FT /label- Potential N-glycosylation site 

FT modified_site 170 

ft /label- Potential N-glycosylation site 

PN WO8606096-A. 

PD 23-OCT-1986. 

PF 10-APR-1986; U00760. 

PR ll-APR-1985; US-722108. 

PA (CHIL-) CHILDRENS MED CENT, 

PA (GINS/) GINSBURG D, 

PI Ginsburg D, Orkin SH, Kaufman RJ; 

DR WPI; 86-291663/44. 

DR N-PSDB; N60405, 

PT Pure Von Willebrand Factor - produced using an expression vector 

PT including a DNA sequence encoding functional VWF protein 

PS Disclosure; Table 1, Page 9; 54pp; English. 

CC cDNA clones pVWH33, pVWH5 and PVWE6 which span 9 kb pairs of DNA and ' 
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CC encompass the entire protein coding region of VWF, were selected to 

CC construct full length cDNA (N60404). The pure VWF produced is useful 

CC in the treatment of von Willebrand's disease (VWD) and the patients 

CC with chronic renal failure whose abnormal bleeding times are 

CC corrected by crude cryoprecipitate. Pure VWF can also be used to 

CC carry, stabilise and improve the therapeutic efficacy of factor 

CC VIII:C. 

SQ Sequence 193 AA; 

Query Match 12.9%; Score 105; DB 3; Length 193; 

Best Local Similarity 41.2%; Pred, No. 5,79e-01; 

Hatches 14; Conservative 5; Mismatches 15; Indels 0; Gaps 0; 

Db 153 ccsptrtepmqvalhctngsvvyhevlnameckc 186 

II III: ::||:H II HI I 
Qy 66 CCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 

»LT 14 
W73499 standard; Protein; 2050 AA. 
AC W73499; 

DT 26-FEB-1999 (first entry) 
DE Von Willebrand factor. 

KW Von Willebrand factor; GPlb binding domain; inhibitor; re-occlusion; 

KW platelet aggregation; cerebrovascular disorder; cardiovascular disorder; 

KW angioplasty; thrombi-containing platelet-rich aggregate; thrombosis; 

KW therapy. 

OS Homo sapiens, 

PN US5849536-A. 

PD 15-DEC-1998. 

PF 30-NOV-1994; 347594. 

PR 01-MAR-1991; WO-U01416. 

PR 02-MAR-1990; US-487767 . 

PR 03-SEP-1991; US-753815. 

PR 22-JUN-1993; US-080690. 

PR 30-NOV-1994; US-347594. 

PA (BIOT-) BIO-TECHNOLOGY GENERAL CORP, 

PI Garfinkel L, Richter T; 
DR WPI; 99-069733/06. 
DR N-PSDB; V08901. 

pt Polypeptide comprising von Willebrand factor GPlb binding domain - 
PT useful as platelet aggregation inhibitor 
PS Disclosure; Fig 12; 62pp; English. 

CC This sequence represents the mature human Von Willebrand factor protein, 

tGPlb binding domain of this sequence represents the protein of the 
invention, The protein is used for inhibiting platelet aggregation, 
especially for treating cerebrovascular disorders. It is also used for 
treating cardiovascular disorders, especially acute myocardial infarction 
CC or angina, The protein is also used for inhibiting platelet aggregation 
CC before, during or after angioplasty, thrombolytic treatment or coronary 
CC bypass surgery, for maintaining blood vessel potency before, during or 
CC after coronary bypass surgery. It can also be used for inhibiting 
CC thrombosis, optionally associated with an inflammatory response, for 
CC inhibiting platelet adhesion to damaged vascular surfaces, for preventing 
CC platelet adhesion to prosthetic materials or devices, for inhibiting 
CC re-occlusion after angioplasty or thrombolysis, or for thrombolytic 
CC treatment of thrombi-containing platelet-rich aggregates. 
SQ Sequence 2050 AA; 

Query Match 12.9%; Score 105; DB 38; Length 2050; 

Best Local Similarity 41.2%; Pred. No. 5,79e-01; 

Matches 14; Conservative 5; Mismatches 15; Indels 0; Gaps 0; 

Db 2010 ccsptrtepmqvalhctngsvvyhevlnameckc 2043 

II III: ::lhll II :M I 
Qy 66 CCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 ■ 



KW Chronic renal failure; therapy; factor VIII C. 

OS Homo sapiens. 

PN WO8606096-A. 

PD 23-OCT-1986, 

PF 10-APR-1986; U00760. 

PR ll-APR-1985; US-722108, 

PA (CHIL-) CHILDRENS MED CENT. 

PA (GINS/) GINSBURG D, 

PI Ginsburg D, Orkin SH, Kaufman RJ; 

DR WPI; 86-291663/44. 

DR N-PSDB; N60404. 

PT Pure Von Willebrand Factor - produced using an expression vector 

PT including a DNA sequence encoding functional VWF protein 

PS Disclosure; Table 2, Pages 18-36A; 54pp; English. 

CC cDNA clones pVWH33, pVWH5 and PVWE6 which span 9 kb pairs of DNA and 

CC encompass the entire protein coding region of VWF, were selected to 

CC construct full length cDNA (N6Q404). The pure VWF produced is useful 

CC in the treatment of von willebrand's disease (VWD) and the patients 

CC with chronic renal failure whose abnormal bleeding times are 

CC corrected by crude cryoprecipitate. Pure VWF can also be used to 

CC carry, stabilise and improve the therapeutic efficacy of factor 

CC VIII ;C. 

SQ Sequence 2813 AA; 

Query Match 12,9%; Score 105; DB 3; Length 2813; 

Best Local Similarity 41.2%; Pred. No. 5.79e-01; 

Matches 14; Conservative 5; Mismatches 15; Indels 0; Gaps 0; 

Db 2773 ccsptrtepmqvalhctngswyhevlnameckc 2806 

II III: ::ll:ll II :ll I 
Qy 66 CCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



Search completed: Fri May 28 08:58:51 1999 
Job time : 51 sees. 



RESULT 15 

ID P60462 standard; Protein; 2813 AA. 
AC P60462; 

DT 25-JDN-1991 (first entry) 

DE Sequence of human von Willebrand Factor (VWF) precursor. 
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Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 



MJS-09-191-647-6 

(1-103) from OS09191647 . pep 

813 

1 QCHISDQGEPYCLCQPGFSG GSSFVEEVERHLECGCLACS 103 

PAM 150 
Gap 11 



122810 seqs, 40068593 residues 

Post -processing: Minimum Match 0* 

Listing first 45 summaries 

Database: pir60 

l:pirl 2:pir2 3:pir3 4:pir4 

Statistics: Mean 36,826; Variance 62.401; scale 0.590 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

A SUMMARIES 



Result 


Query 












GENETICS 




no. Score 


Match Length 


DB 


ID 


Description p 


red. No. 


tgene 


FlyBase:sli 


1 150 














ttcross- references FlyBase:FBgn0003425 


18.5 


1469 


2 


B36665 


slit protein 2 precur 


5.93e-13 


CLASSIFICATION 


Isuperfamily proteoglycan amino-terminal homology; EGF 


2 139 


17.1 


1480 


2 


A36665 


slit protein 1 precur 


8.26e-ll 




homology; leucine-rich alpha -2 -glycoprotein repeat 


3 132 


16.2 


2321 


2 


S78549 


notch3 protein - huma 


1.79e-09 




homology; proteoglycan carboxyl -terminal homology 


4 129 


15.9 


375 


2 


A41428 


CEF-10 protein precur 


6.59e-09 


FEATURE 




5 121 


14.9 


379 


2 


A35669 


gene CYR61 protein pr 


2.01e-07 


66-91 


tdomain proteoglycan amino-terminal homology tlabel 


6 116 


14.3 


5147 


1 


IJFFTM 


cadherin-related tumo 


1.63e-06 




PAH1\ 


7 113 


13.9 


293 


2 


B26637 


neurogenic repetitive 


5.62e-06 


101-124 


tdomain leucine-rich alpha-2-glycoprotein repeat 


' 8 113 


13.9 


2139 


2 


A35672 


crumbs protein - frui 


5.62e-06 




homology tlabel LRR1\ 


9 113 


13.9 


2437 


2 


S42612 


transmembrane protein 


5.62e-06 


125-148 


tdomain leucine-rich alpha-2 -glycoprotein repeat 


10 112 


13.8 


2318 


2 


S45306 


notch 3 protein - mou 


8.46e-06 




homology tlabel lrr2\ 


11 111 


13.7 


2531 


2 


S18188 


notch protein homolog 


1.27e-05 


149-172 


tdomain leucine-rich alpha-2 -glycoprotein repeat 


12 111 


13.7 


2531 


2 


A46019 


gene Notch- 1 protein 


1.27e-05 




homology tlabel LRR3\ 


13 110 


13.5 


706 


2 


H71707 


hypothetical protein 


1.91e-05 


173-196 


tdomain leucine-rich alpha-2-glycoprotein repeat 


14 110 


13.5 


2471 


2 


A49128 


cell -fate determining 


1.91e-05 




homology tlabel LRR4\ 


15 108 


13.3 


1203 


2 


A49175 


Motch B protein - mou 


4.29e-05 


197-220 


tdomain leucine-rich alpha -2 -glycoprotein repeat 


16 108 


13.3 


2524 


2 


A35844 


Xotch protein ■ Afric 


4.29e-05 




homology tlabel LRR5\ 


17 107 


13.2 


530 


2 


A31640 


epidermal growth fact 


6.41e-05 


228-272 


tdomain proteoglycan carboxyl-terminal homology tlabel 


18 105 


12.9 


2813 


1 


VWHU 


von Willebrand factor 


1.42e-04 




PCS1\ 


19 105 


12.9 


3707 


2 


S18252 


heparan sulfate prote 


1.42e-04 


288-313 


tdomain proteoglycan amino-terminal homology tlabel 


20 105 


12.9 


4543 


1 


A53102 


alpha-2-niacroglobulin 


1.42e-04 




PAH2\ 


21 103 


12.7 


638 


2 


S08042 


proteoglycan core pro 


3.14e-04 


323-346 


tdomain leucine-rich alpha-2-glycoprotein repeat 


22 103 


12.7 


2703 


2 


A24420 


notch protein - fruit 


3.14e-04 




homology tlabel LRR6\ 


23 102 


12.5 


387 


2 


B49175 


Motch A protein - mou 


4.656-04 


347-370 


tdomain leucine-rich alpha-2 -glycoprotein repeat 



24 


102 


12.5 


570 


2 


A48836 


fibropellin C precurs 


4,65e-04 


25 


102 


12.5 


603 


2 


S28941 


coagulation factor XI 


4.65e-04 


26 


102 


12.5 


615 


1 


KFHU12 


coagulation factor XI 


4.65e-04 


27 


102 


12.5 


861 


2 


A48825 


Notch homolog Motch p 


4 . 65e-04 


28 


102 


12.5 


2555 


2 


A40043 


notch protein homolog 


4.65e-04 


29 


101 


12.4 


1042 


2 


A57534 


mucin (clone L31) - h 


6.87e-04 


30 


101 


12.4 


1056 


2 


A53767 


tracheobronchial muci 


6.87e-04 


31 


101 


12.4 


4391 


2 


A38096 


perlecan precursor - 


6,87e-04 


32 


99 


12.2 


427 


2 


JC4915 


ags protein precursor 


l,49e-0. 


33 


99 


12.2 


728 


2 


150719 


C-Delta-1 - chicken 


1.49e-0. 


34 


99 


12.2 


1220 


2 


A56136 


jagged protein precur 


1.49e-03 


35 


96 


11.8 


383 


2 


B45484 


delta-like dlk homeot 


4.70e-03 


36 


96 


11.8 


383 


2 


S53716 


homeotic protein dlk 


4 .70e-03 


37 


96 


11.8 


463 


2 


A36479 


milk fat globule memb 


4.70e-03 


38 


96 


11.8 


3712 


2 


S18253 


laminin alpha-1 chain 


4.70e-0. 


39 


95 


11.7 


385 


2 


S53718 


homeotic protein dlk 


6.87e-0 


40 


95 


11.7 


385 


2 


A54785 


preadipocyte factor 1 


6.87e-0 


41 


95 


11.7 


832 


2 


A31246 


neurogenic protein De 


6.87e-0 


42 


95 


11.7 


833 


2 


S19087 


gene Delta protein pr 


6.87e-0 


43 


95 


11,7 


880 


2 


S00670 


gene Delta protein pr 


6.87e-0 


44 


95 


11,7 


1064 


2 


A40136 


fibropellin la - sea 


6.87e-0 


45 


95 


11.7 


3566 


2 


A40701 


tenascin-x precursor 


6.87e-0 



RESULT 
ENTRY 
TITLE 



ORGANISM 
DATE 



B36665 ttype complete 
slit protein 2 precursor - fruit fly (Drosophila 

melanogaster) 
tformaljame Drosophila melanogaster 
30-Apr-1991 #sequence_revision 30-Apr-1991 ttext.change 

16-Dec-1998 
B36665 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross -references MUID: 91099665 
taccession B36665 

ttstatus preliminary 
ttmolecule_type mRNA 
ttresidues 1-1469 ttlabel ROT 
ttcross-references GB;X53959 



tauthors 



tjoumal 
ttitle 
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371 


394 


395 


418 


419 


442 


450 


494 


512 


537 


547 


571 


572 


595 


596 


619 


620 


643 


651 


695 


J'08 


733 


743 


766 


767 


790 


846 


890 


1028-1061 



homology » label LRR7\ 
♦domain leucine-rich alpha - 2 - glycoprotein repeat 

homology ilabel LRR8\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology Ilabel LRR9\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR10\ 
tdomain proteoglycan carboxyl-terminal homology Ilabel 

PCS2\ 

tdomain proteoglycan amino-terminal homology tlabel 

PAH3\ 

tdomain leucine-rich alpha - 2 -g lycoprotein repeat 

homology tlabel LRll\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR12\ 
tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR13\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR14\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS3\ 

tdomain proteoglycan amino-terminal homology tlabel 
PAH4\ 

tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR15\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR16\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS4\ 

tdomain EGF homology tlabel EGF 
tlength 1469 Smolecular-weight 164695 tchecksum 8361 



Query Match 18,5%; 
Best Local Similarity 29.9%; 
Matches 29; Conservative 



Score 150; DB 2; Length 1469; 

Pred. No. 5.93e-13; 

18; Mismatches 44; Indels 6; 



1372 SNARDGYQCKCKHGQRGRYCDQAASTCRKEQVREYYT-ENDCRSRQPL-KYA-K-CVGGC 1427 
h MM I I hi : I : III :: I I : M III 
5 SDQGEPY-CLCQPGFSGEHCQQENP-CLGQWREVIRRQKGYASCATASKVPIMECRGGC 62 

1428 GNQCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRKCGC 1464 
I III : :||| I:: :: ::: III 
63 GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 
ENTRY 
TITLE 



ACCESSIONS 



tauthors 



tjournal 
ftitle 



A36665 ttype complete 
slit protein 1 precursor - fruit fly (Drosophila 

»melanogaster) 
ISM tformaljiame Drosophila melanogaster 

30-Apr-1991 tsequence_revision 30-Apr-1991 ttext change 

24-Sep-1998 
A36665; S13523 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross-references MUID:91099665 
taccession A36665 

ttstatus preliminary 
ttmoleculejype mRNA 
it residues 1-1480 t tlabel ROT 
ttcross-references GB:X53959; NID:g8614; PID:g8615 
GENETICS 

tgene' FlyBaseisli 

ttcross-references FlyBase: FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 
KEYWORDS alternative splicing 



FEATURE 

66-91 tdomain proteoglycan amino-terminal homology tlabel 

PAH1\ 

101-124 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR1\ 
125-148 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR2\ 
149-172 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR3\ 
173-196 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
197-220 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
228-272 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS1\ 

288-313 tdomain proteoglycan amino-terminal homology tlabel 

PAH2\ 

323-346 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
347-370 tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LRR7\ 
371-394 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR8\ 
395-418 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR9\ 
419-442 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LR10\ 
450-494 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS2\ 

512-537 tdomain proteoglycan amino-terminal homology tlabel 

PAH3\ 

547-571 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR11\ 
572-595 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR12\ 
596-619 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LR13\ 
620-643 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LR14\ 
651-695 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS3\ 

708-733 tdomain proteoglycan amino-terminal homology tlabel 

PAH4\ 

743-766 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR15\ 
767-790 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LR16\ 
791-814 tdomain leucine-rich a lpha - 2 - g 1 y coprotein repeat 

homology tlabel LR17\ 
815-838 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR18\ 
846-890 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS4\ 

1028-1061 tdomain EGF homology tlabel EGF 

SUMMARY tlength 1480 tmolecular-weight 165751 tchecksum 900 



Query Match 17.11; 
Best Local Similarity 35.7%; 
Matches 15; Conservative 



Score 139; DB 2; Length 1480; 
Pred. No. 8.26e-ll; 
9; Mismatches 18; Indels 0; 



Db 1434 CVGGCGNQCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRKCGC 1475 

I llll III : Ml M :: ::: III 
Qy 58 CRGGCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 3 

ENTRY 

TITLE 

ORGANISM 

DATE 

ACCESSIONS 



S78549 ttype complete 

notch3 protein - human 

tformaljiame Homo sapiens tcommonjiame man 

24-Jul-1998'tsequence_revision 24-Jul-1998 ttext change 

17-Mar-1999 
S7B549; S71825 
S78549 
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iauthors Joutel, A.; Tournier-Lasserve, E. 
♦submission submitted to the EMBL Data Library, April 1997 
taccession S78549 
ftmolecule_type mRNA 
ftresidues 1-2321 Mabel J0U1 
f tcross -references EMBL:097669; NID;g2668591; PID:g2668592 
REFERENCE S71825 

♦authors Joutel, A.; Corpechot, C; Ducros, A.; Vahedi, K.; Chabriat, 
H.; Mouton, P.; Alamowitch, S,; Domenga, v.; Cecillion, M.; 
Marechal, E. ; Maciazek, J.; Vayssiere, C; Cruaud, C; 
Cabanis, E.A.; Ruchoux, M.M.; Weissenbach, J,; Bach, J.F.; 
Bousser, M.G.; Tournier-Lasserve, E. 
tjournal Nature (1996) 383:707-710 
ttitle Notch3 mutations in CADASIL, a hereditary adult-onset 

condition causing stroke and dementia, 
f cross -references MUID:97032728 

• ((accession S71825 
tfstatus nucleic acid sequence not shown 
ttmolecule.type DNA 

ftresidues 67-113; 138-194 ; 268-333 , 'G' , 335-346; 536-613; 716-765; 

1240-1279; 1815-1888 tflabel JOU2 
itcross-references EMBL:097669 
GENETICS 

fgene notch3 
#raap_position 19pl3,l 
FUNCTION 

fdescription may be involved in pathogenesis of CADASIL, causing a type of 

stroke and dementia 
CLASSIFICATION #superfamily unassigned ankyrin repeat proteins; ankyrin 

repeat homology; EGF homology 
KEYWORDS tandem repeat; transmembrane protein 

FEATURE 

318-349 fdomain EGF homology tlabel EGF\ 

1838-1870 fdomain ankyrin repeat homology tlabel AN1\ 

1871-1903 fdomain ankyrin repeat homology tlabel AN2\ 

1905-1937 fdomain ankyrin repeat homology tlabel AN3\ 

1938-1970 fdomain ankyrin repeat homology tlabel AN4\ 

1971-2003 fdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2321 f molecular -weight 243657 tchecksum 3337 

Query Match 16.2%; Score 132; db 2; Length 2321; 

Best Local Similarity 45.5%; Pred. No, 1.79e-09; 

Matches 15; Conservative 9; Mismatches 7; Indels 2; Gaps 2; 



i 



1060 QC-VDEDSSHYCVCPEGRTGSHCEQEVDPCLAQ 1091 
II : ::: Ihl I :l Ihll :|||:| 
1 OCHISDQGEPYCLCOPGFSGEHCQQE-NPCLGQ 32 



RESULT 4 

ENTRY A41428 #type complete 

TITLE CEF-10 protein precursor - chicken 

ORGANISM tformaljame Gallus gallus tcommonjiame chicken 

DATE 03-Apr-1992 tsequence revision 03-Apr-1992 ttext change 

10-Sep-1997 

ACCESSIONS A41428 y 

REFERENCE A41428 

fauthors Simmons, D.L.; Levy, D.B.; Yannoni, Y.; Erikson, R.L. 
fjournal Proc. Natl. Acad. Sci, U.S.A. (1989) 86:1178-1182 
ttitle Identification of a phorbol ester-repressible v-src- inducible 
gene. 

f cross-references MUID: 89145206 
taccession A41428 

ttstatus preliminary 

ftmolecule_type mRNA 

ftresidues 1-375 tflabel SIM 

f tcross -references GB: J04496; NID:g211435; PID:g211436 
SUMMARY tlength 375 f molecular -weight 40651 tchecksum 1417 

Query Match 15.9%; Score 129; DB 2; Length 375; 

Best Local Similarity 33.3%; Pred. No. 6,59e-Q9; 

Matches 19; Conservative 11; Mismatches 25; Indels 2; Gaps 2; 



295 YAGCSSVKKYRPKYC-GSCVDGRCCTPQQTRTVKIRFRCDDGETFTKSVMMIQSCRC 350 

1 1 : 1 : = I : I |:| =11 I ::: I hi II :| I II 
44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 5 

ENTRY A35669 ftype complete 

TITLE gene CYR61 protein precursor - mouse 

ORGANISM fformaljiame Mus musculus fcommon w name house mouse 

DATE 28-Sep-1990 tsequence_revision 18 : Nov-1992 ttext.change 

16-Dec-1998 
ACCESSIONS A35669; 148319; S16446 
REFERENCE A35669 

fauthors O'Brien, T.P.; Yang, G.P.; Sanders, L,; Lau, L.F. 

fjournal Mol, Cell. Biol. (1990) 10:3569-3577 

ttitle Expression of cyr61, a growth factor -inducible 

immediate -early gene, 
tcross -references MUID: 90287146 
taccession A35669 

ffstatus preliminary 
ffmolecule_type mRNA 
tfresidues 1-379 ttlabel OAB 
ffcross-references GB:M32490; NID:gl92909; PID:g309206 
ttnote the authors translated the codon GAT for residue 337 as 

Gin 

REFERENCE 148319 

fauthors Latinkic, B.V.; O'Brien, T.P.; Lau, L.F, 
fjournal Nucleic Acids Res. (1991) 19:3261-3267 
ttitle Promoter function and structure of the growth 
factor -inducible immediate early gene cyr61. 
tcross -references MUID: 91288203 
taccession 148319 

ttstatus translated from GB/EMBL/DDBJ 
ttmolecule.type DNA 
ftresidues 1-379 tflabel RES 
f tcross -references EMBL:X56790; NID;g50632; PID;g50633 
ttnote the authors did not translate the codon for residue 108 

ttnote the authors translated the codon GAT for residue 337 as 

Gin 

GENETICS 

fgene CYR61 

fintrons 21/3; 93/1; 208/1; 279/3 
CLASSIFICATION tsuperfamily von Willebrand factor type C repeat homology 
FEATURE 

99-166 tdomain von willebrand factor type C repeat homology 

tlabel VWC 

SUMMARY tlength 379 fmolecular-weight 41709 tchecksum 3726 

Ouery Match 14,9%; Score 121; DB 2; Length 379; 

Best Local Similarity 33.3%; Pred. No. 2.01e-07; 

Matches 19; Conservative 11; Mismatches 25; Indels 2; Gaps 2; 

Db 298 YAGCSSVKKYRPKYC-GSCVDGRCCTPLQTRTVKMRFRCEDGEMFSKNVMMIQSCKC 353 

11:1:: I : I |:| :|| I ::: I |:| II I :| I I 
Qy 44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 
ENTRY 
TITLE 



6 



UFFTM ttype complete 

cadher in -related tumor suppressor precursor - fruit fly 
(Drosophila melanogaster) 
ORGANISM tformaljiame Drosophila melanogaster 

DATE 30-Sep-1993 §sequence_revision 30-Sep-1993 ttext change 

16-Feb-1997 
ACCESSIONS A41087; B41087 
A41087 

fauthors Mahoney, P. A.; Weber, U.; Onofrechuk, P.; Biessmann, H. ; 

Bryant, P.J.; Goodman, C.S. 
fjournal Cell (1991) 67:853-868 

ttitle The fat tumor suppressor gene in Drosophila encodes a novel 

member of the cadherin gene super family, 
tcross -references MUID: 92069752 
taccession A41087 



Tue Jim 1 10:16:24 1999 



US-09-191-647-6.rpr 



Page 4 



ttmolecule.type mRNA 

ttresidues 143-485,-1279-5147 ttlabel MAH 
ttcross -references GB:M80537 
taccession B41087 
ttmolecule type DNA 

ttresidues 1-142;487-1278 Mlabel MA2 

(tlcross -references GB:M80537 

tlnote 1229-Gly and 1233-Ser were also found 

GENETICS 

tgene fat 

ttcross-references FlyBase:FBgn0001075 
CLASSIFICATION fsuperfamily cadherin-related tumor suppressor; cadherin 

repeat homology; EGF homology 
KEYWORDS calcium binding; cell adhesion; duplication; transmembrane 
protein 



FEATURE 
1-35 
36-5147 



A 5: 



36-4583 
51-156 
.59-270 
'271-382 
390-494 
497-599 
602-708 
718-822 
831-942 
948-1049 



tdomain signal sequence tstatus predicted tlabel SIG\ 
tproduct cadherin-related tumor suppressor tstatus 

predicted tlabel MAT\ 

tdomain extracellular tstatus predicted f label EXT\ 

tdomain cadherin repeat homology tlabel CRl\ 

tdomain cadherin repeat homology tlabel CR2\ 

tdomain cadherin repeat homology tlabel CR3\ 

tdomain cadherin repeat homology tlabel CR4\ 

tdomain cadherin repeat homology tlabel CR5\ 

tdomain cadherin repeat homology tlabel CR6\ 

tdomain cadherin repeat homology tlabel CR7\ 

tdomain cadherin repeat homology tlabel CR8\ 

tdomain cadherin repeat homology tlabel CR9\ 

tdomain cadherin repeat homology tlabel C10\ 

tdomain cadherin repeat homology tlabel CU\ 

tdomain cadherin repeat homology tlabel C12\ 

tdomain cadherin repeat homology tlabel C13\ 

tdomain cadherin repeat homology tlabel C14\ 

tdomain cadherin repeat homology tlabel C15\ 

tdomain cadherin repeat homology tlabel C16\ 

tdomain cadherin repeat homology tlabel C17\ 

tdomain cadherin repeat homology tlabel C18\ 

tdomain cadherin repeat homology tlabel C19\ 

tdomain cadherin repeat homology tlabel C20\ 

tdomain cadherin repeat homology tlabel C99\ 

tdomain cadherin repeat homology tlabel C21\ 

tdomain cadherin repeat homology tlabel C22\ 

tdomain cadherin repeat homology tlabel C23\ 

tdomain cadherin repeat homology tlabel C24\ 

tdomain cadherin repeat homology tlabel C25\ 

tdomain cadherin repeat homology tlabel C26\ 

tdomain cadherin repeat homology tlabel C27\ 

tdomain cadherin repeat homology tlabel C28\ 

tdomain cadherin repeat homology tlabel C29\ 

tdomain cadherin repeat homology tlabel C30\ 

tdomain cadherin repeat homology tlabel C31\ 

tdomain cadherin repeat homology tlabel C32\ 

tdomain cadherin repeat homology tlabel C33\ 

tdomain EGF homology tlabel EG1\ 

tdomain EGF homology tlabel EG2\ 

tdomain EGF homology tlabel EG3\ 

tdomain EGF homology tlabel EG4\ 

tdomain transmembrane tstatus predicted tlabel TMM\ 

tdomain intracellular tstatus predicted tlabel INT 
tlength 5147 f molecular -weight 564895 tchecksum 6994 



Query Match 14,3%; Score 116; DB 1; Length 5147; . 

Best Local Similarity 43,31; Pred. No. 1.63e-06; 

Matches 13; Conservative 9; Mismatches 6; Indels 2; Gaps 2; 

Db 4067 CQRSPDGSSYFCLCRPGFRGNQCESVSDSC 4096 

I: I :| :! 1 1 1 : 1 1 ( |::|: ::| 
Qy 2 CH ISDQGEPY - CLCQ PGFSGEHCQQ - ENPC 29 



1052' 
1156' 
1281' 
1387' 
1492' 
1607' 
1717' 
1826 
1925 
2028' 
2169 
2281 
2387 
2494 
2599 
2707 
2813 
2915 
3014 
3127 
W232 
B337 
^ 3442 
3548 
3654 
3954 
4017 
4056' 
4096- 
4584 
4610' 
SUMMARY 



•1153 
•1278 
-1384 
•1489 
•1601 
-1713 
-1823 
-1922 
-2027 
-2167 
-2278 
-2384 
-2491 
-2596 
-2703 
•2810 
-2913 
•3013 
•3124 
•3229 
-3334 
•3439 
-3545 
-3651 
•3756 
-4010 
-4048 
-4089 
.-4127 
-4609 
1-5147 



RESULT 7 

ENTRY B26637 



ttype fragment 



TITLE neurogenic repetitive locus 95F protein - fruit fly 

(Drosophila melanogaster) (fragment) 
ORGANISM tformal.name Drosophila melanogaster 
DATE 16-Aug-1988 tsequence_revision 16-Aug-1988 ttext change 

14-Aug-1998 
B26637 
A91081 

tauthors . Knust, E.; Dietrich, U,; Tepass, U.; Bremen K.A.; Weigel, 

D,; Vaessin, H. ; Campos -Ortega, J, A. 
♦journal EMBO J, (1987) 6:761-766 

ttitle EGF homologous sequences encoded in the genome of Drosophila 

melanogaster, and their relation to neurogenic genes, 
tcross -references MUID: 87218537 
taccession B26637 
ttmolecule.type mRNA 
ttresidues 1-293 It label KNU 
ttcross-references GB;X05144; NID:g7519; PID:g929536 
GENETICS 

tgene FlyBase : crb 

ttcross-references FlyBase:FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 
KEYWORDS transmembrane protein 
FEATURE 

216-252 tdomain EGF homology tlabel EGF 

SUMMARY tlength 293 tchecksum 3413 

Query Match 13.91; Score 113; DB 2; Length 293; 

Best Local Similarity 43 .8%; Pred. No. 5.62e-06; 

Matches 14; Conservative 8; Mismatches 8; Indels 2; Gaps 2; 

Db 150 C - INQVAAFFCQCQPGFEGQHCEQNIDECADQ 180 

I I:: : :l Mill MM: : | | 
Qy 2 C HISDQGEPYCLCQPGFSGEHCQQE - NPCLGQ 32 



8 



RESULT 

ENTRY A35672 ttype complete 

TITLE crumbs protein - fruit fly (Drosophila melanogaster) 

ORGANISM tformaljiame Drosophila melanogaster 

DATE 21-Sep-1990 fsequence_revision 18-Nov-1992 ttext.change 

14-Aug-1998 
ACCESSIONS A35672 
REFERENCE A35672 

tauthors Tepass, 0.; Theres, C; Knust, E, 

tjournal Cell (1990) 61:787-799 

ttitle crumbs encodes an EGF-like protein expressed on apical 

membranes of Drosophila epithelial cells and required for 
organization of epithelia. 
tcross -references MUID:90263104 
taccession A35672 

ttstatus preliminary 
ttmolecule.type mRNA 
ttresidues 1-2139 ttlabel TEP 
ttcross-references GB;M33753 

ttnote the authors translated the codon GGC for residue 1928 as 

Cys, and TAT for residue 2023 as Gin 

GENETICS 

tgene FlyBase: crb 

ttcross-references FlyBase: FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 
KEYWORDS transmembrane protein 

FEATURE 

691-722 tdomain EGF homology tlabel EGF 

SUMMARY tlength 2139 tmolecular-weight 233619 tchecksum 7230 

Query Match 13.91; Score 113; DB 2; Length 2139; 

Best Local Similarity 43.8%; Pred. No. 5.62e-06; 
Matches 14; Conservative 8; Mismatches 8; Indels 



1812 C - INQVAAFFCQCQPGFEGQHCEQNIDECADQ 1842 
I M : :| Mil! MM : I I 
2 CHISDQGEPYCLCQPGFSGEHCQQE-NPCLGQ 32 



Gaps 2; 
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RESULT 9 

ENTRY S42612 ttype complete 

TITLE transmembrane protein precursor * zebra fish 

ORGANISM iformaljiame Brachydanio rerio #common_name zebra fish 
DATE 20-Feb-1995 tsequence revision 20-Feb-1995 ftext change 

lO-Jul-1998 
ACCESSIONS S42612 
REFERENCE S42612 

tauthors Bierkanip, C; Campos -Ortega, J. A. 

tjournal Mech. Dev. (1993) 43:87-100 

ttitle A zebrafish homologue of the Drosophila neurogenic gene Notch 
and its pattern of transcription during early 
embryogenesis . 

Hcross-references MOID: 94 128602 

f access ion S42612 

• Mstatus preliminary 
f#molecule_type mRNA 
ttresidues 1-2437 tflabel BIE 
ticross -references EHBL:X69088; NID:g433866; PID:g433867 
CLASSIFICATION Itsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1915-1947 fdomain ankyrin repeat homology llabel ANl\ 

1948-1980 fdomain ankyrin repeat homology llabel AN2\ 

1982-2014 fdomain ankyrin repeat homology tlabel AN3\ 

2015-2047 fdomain ankyrin repeat homology tlabel AN4\ 

2048-2080 fdomain ankyrin repeat homology tlabel AN5 

SUMMARY flength 2437 f molecular -weight 262306 tchecksum 4021 

Query Match 13,9%; Score 113; DB 2; Length 2437; 

Best Local Similarity 51.6*; Pred, No, 5,62e-06; 

Matches 16; Conservative 5; Mismatches 7; Indels. 3; Gaps 3; 

Db 1361 C • VSGHLSPRCLCAPGFSGHECQTRMDSPCL 1390 

I :l : I III Mill II : ::||| 
Qy 2 CHISDQGEPYCLCQPGFSGEHCQ-Q-ENPCL 30 



RESULT 10 

ENTRY S45306 ftype complete 

title notch 3 protein • mouse 

ORGANISM iformaljiame Mus musculus tcommonjiame house mouse 
DATE 20-Feb-1995 tsequence revision 20-Feb-1995 itext change 

10-Jul-1998 

Sessions S45306 

BeRENCE S45306 

^tauthors Lardelli, M.; Dahlstrand, J.; Lendahl, 0. 
fjournal Mech, Dev. (1994) 46:123-136 
ftitle The novel Notch homologue mouse Notch 3 lacks specific 
epidermal growth factor -repeats and is expressed in 
proliferating neuroepithelium. 
f cross -references MUID:95001556 
taccession S45306 

fistatus preliminary 
ttmolecule_type mRNA 
ttresidues 1-2318 tflabel LAR 
ttcross -references EMBL:X74760; NID:'g483580; PID:g483581 
CLASSIFICATION fsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1839-1871 fdomain ankyrin repeat homology tlabel AN1\ 

1872-1904 fdomain ankyrin repeat homology ilabel AN2\ 

1906-1938 fdomain ankyrin repeat' homology tlabel AN3\ 

1939-1971 fdomain ankyrin repeat homology tlabel AN4\ 

1972-2004 fdomain ankyrin repeat homology tlabel AN5 

SUMMARY flength 2318 fmolecular -weight 244245 tchecksum 9358 

Query Match 13.8%; Score 112; DB 2; Length 2318; 

Best Local Similarity 50.01; Pred. No. 8.46e-06; 

Matches 11; conservative 8; Mismatches 2; Indels 1; Gaps 1; 



Db 949 CLCRPGYTGTHCQYEADPCFSR 970 



ll|:||::| III : :||::: 
Qy 12 CLCQPGFSGEHCQ-QENPCLGQ 32 



RESULT 11 

ENTRY S18188 ttype complete 

TITLE notch protein homolog - rat 

ORGANISM fformaljiame Rattus norvegicus tcommonjiame Norway rat 

DATE 19-Feb-1994 isequence_revision 10-Nov-1995 itext change 

12-Feb-1999 
ACCESSIONS S18188 
REFERENCE S18188 

tauthors Weinmaster, G.; Roberts, V.J.; Lemke, G. 

fjournal Development (1991) 113:199-205 

ftitle A homolog of Drosophila Notch expressed during mammalian 

development, 
f cross -references MUID: 92111383 
taccession S18188 
ftmoleculejiype mRNA 
ttresidues 1-2531 tflabel WEI 
tfcross-references EMBL:X57405; NID:g57634; PID:g57635 
CLASSIFICATION fsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1917-1949 fdomain ankyrin repeat homology tlabel AN1\ 

1950-1982 fdomain ankyrin repeat homology tlabel AN2\ 

1984-2016 fdomain ankyrin repeat homology tlabel AN3\ 

2017-2049 fdomain ankyrin repeat homology tlabel AN4\ 

■ 2050-2082 fdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2531 fmolecular-weight 270907 tchecksum 2705 

Query Match 13.7%; Score 111; DB 2; Length 2531; 

Best Local Similarity 35.5%; Pred, No, 1.27e-05; 

Matches 11; Conservative 12; Mismatches 7; Indels 1; Gaps 1; 

Db 



36 RCEVANGTEA-CVCSGAFVGQRCQDPSPCLS 65 

:| ::: h |:| :| |::||: :|||: 
Qy 1 QCHISDQGEPYCLCQPGFSGEHCQQENPCLG 31 



RESULT 12 

ENTRY A46019 ttype complete 

TITLE gene Notch-1 protein - mouse 

ORGANISM iformaljiame Mus musculus tcommonjiame house mouse 

DATE 22-Sep-1993 tsequence_revision 18-Nov-1994 Stext change 

14-Aug-1998 
ACCESSIONS A46019 
REFERENCE A46019 

tauthors del Amo, F.F.; Gendron-Maguire, M,; Swiatek, P.J.; Jenkins, 
N.A.; Copeland, N.G.; Gridley, T. 

tjournal Genomics (1993) 15:259-264 

ttitle Cloning, analysis, and chromosomal localization of Notch-1, a 

mouse homolog of Drosophila Notch, 
f cross -references MOID: 93194170 
taccession A46019 

tfstatus preliminary; not compared with conceptual translation 

tfmoleculeJ;ype nucleic acid 

ttresidues 1-2531 tflabel DEL 

tfcross-references GB:Z11886; GB:S47228; NID:g288502; PID;g288503 
ttnote sequence extracted from NCBI backbone (NCBIP: 127318) 

CLASSIFICATION fsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

tdomain EGF homology tlabel egf\ 

fdomain ankyrin repeat homology ilabel AN1\ 

fdomain ankyrin repeat homology tlabel AN2\ 

fdomain ankyrin repeat homology tlabel AN3\ 

fdomain ankyrin repeat homology tlabel AN4\ 

fdomain ankyrin repeat homology tlabel AN5 

tlength 2531 fmolecular-weight 271312 tchecksum 6611 



Query Match 13.7%; Score 111; DB 2; Length 2531; 

Best Local Similarity 24,2%; Pred. No. 1.27e-05; 



757-788 

1917-1948 

1949-1981 

1983-2015 

2016-2048 

2049-2081 
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Matches 24; Conservative 



Mismatches 39; Indels 



Gaps 8; 



Db 589 CLCQPGYTGHHCETNINECHSQPCRHGGTCQDRDNSYLCLCLKGTTGPNCEINLDDCASN 648 

II: : I I :| I : |:::| I :: I I |:: 
Qy 12 CLCQPGFSGEHCQQE • NPCLGQWREV • I - R- RQKGYAS -CAT ASKVP IMECR- GGCGPQ 65 

Db 649 PCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPC 687 

I:: : ::|: ::: : : :: ||: ;| 
Qy 66 CCQPTRSKRRKY VFQCTDGSSFVEEV - ERHL • ECGCLAC 102 



RESULT 13 



TITLE 

ORGANISM 

DATE 

ACCESSIONS 
REFERENCE 
iauthors 



ijournal 
•title 



iaccession 
ttstatus 



H71707 ftype complete 
hypothetical protein RP006 - Rickettsia prowazekii 
tformaljiame Rickettsia prowazekii 
21-Nov-1998 tsequencejrevision 21-Nov-1998 ftext change 

21-Nov-1998 
H71707 
A71630 

Andersson, S.G.E.; Zomorodipour, A.; Andersson, J . 0 . ; 

•Sicheritz-Ponten, T.; Alsmark, u.c.M.; Podowski, R.M.; 
Naeslund, A.K.; Eriksson, A.S.; Winkler, H.H.; Kurland, 
C.G. 

Nature (1998) 396:133-140 

The genome sequence of Rickettsia prowazekii and the origin 

of mitochondria. 
H71707 

preliminary; nucleic acid sequence not shown; 
translation not shown 
ttmolecule.type DNA 
firesidues 1-706 unlabel and 

^cross-references GB:AJ235270; GB:AJ235269; NID:g3860572; PID:el342322; 

PID:g3860578 
f#experimental_source strain Madrid E 
GENETICS 

♦gene ' RP006 
SUMMARY ilength 706 tmolecular-weight 80294 tchecksum 4323 

Query Match 13.5*; Score 110; DB 2; Length 706; 

Best Local Similarity 36.6%; Pred, No. l,91e-Q5; 

Matches 15; Conservative 13; Mismatches 9; Indels 4; Gaps 3; 

Db 649 SLMQCMMKGICG-QCIQKVKGKQ-KYIFACSEQNQNVEIID 687 

::|:| :| II II I ::|: Ihl |:: : II :: 
Qy 54 PIMEC--RGGCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVE 92 



RESULT 
ENTRY 

Ale 

■anis.m 

mte 



14 



A49128 itype complete 
cell -fate determining gene Notch2 protein - rat 
tformaljiame Rattus norvegicus tcommonjiame Norway rat 
21-Jan-1994 lsequence_revision 18-Nov-1994 itext change 

14-Aug-1998 
A49128 
A49128 

Iauthors Weinmaster, G, ; Roberts, V.J.; Lemke, G. 
ijournal Development (1992) 116:931-941 
ititle Notch2: a second mammalian Notch gene. 
#cross -references MUID:93202015 
Iaccession A49128 

ttstatus preliminary; not compared with conceptual translation 
llmolecule.type mRNA 
Idresidues 1-2471 lllabel WEI 
fiexperimental_source Schwann cell 

sequence extracted from NCBI backbone (NCBIP; 127811) 
tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 



tlnote 
CLASSIFICATION 

FEATURE 
1029-1060 
1876-1908 
1909-1941 
1943-1975 
1976-2008 



tdomain EGF homology lllabel EGF\ 
tdomain ankyrin repeat homology It label AN1\ 
tdomain ankyrin repeat homology » label AN2\ 
tdomain ankyrin repeat homology tlabel AN3\ 
tdomain ankyrin repeat homology tlabel AN4\ 



2009-2041 • tdomain ankyrin repeat homology tlabel AN5 
SUMMARY Ilength 2471 tmolecular-weight 265367 tchecksum 5929 



Query Match 13.5%; 
Best Local Similarity 46.44; 
Matches 13; Conservative 



Score 110; DB 2; Length 2471; 
Pred. No. 1.91e-05; 
2; Mismatches 13; Indels C 



Gaps 



Db 41 CVTYHNGTGYCRCPEGFLGEYCQHRDPC 68 

I I II I II II II: :|| 
Qy 2 CHISDQGEPYCLCQPGFSGEHCQQENPC 29 



RESULT 
ENTRY 
TITLE 



A49175 • Itype fragment 
Motch B protein - mouse (fragment) 



ACCESSIONS 



iauthors 
ijournal 
ititle 



ALTERNATEJAMES Notch homolog 
ORGANISM tformaljiame Mus musculus t common jiame house mouse 

DATE 21-Jan-1994 tsequence_revision 05-Jan-1996 itext change 

14-Aug-1998 
A49175; PH1570; S32113 
A49175 

Lardelli, M. ; Lendahl, 0. 
Exp. Cell Res. (1993) 204:364-372 
Motch A and Motch B-two mouse Notch homologues coexpressed 
in a wide variety of tissues, 
icross- references MUID; 93178563 
iaccession A49175 

itstatus preliminary; nucleic acid sequence not shown 

ttmolecule_type mRNA 

itresidues 1-1203 lllabel LAR 

itcross -references EMBL:X68279; NID:g287989; PID:g287990 

ttexperimental_source embryo 

itnote sequence extracted from NCBI backbone (NCBIP: 126158) 

COMMENT This protein has many EGF repeats and lin-12/Notch repeats, 
COMMENT This protein is one of the neurogenic proteins controlling the 

decision between ectodermaland neural fate for cells in the early 

embryo. 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 



FEATURE 

560-591 
SUMMARY 



tdomain EGF homology ilabel EGF 
ilength 1203 tchecksum 910 



Query Match 13.3%; Score 108; DB 2; Length 1203; 

Best Local Similarity 50.0*; Pred. No. 4.29e-05; 

Matches 16; Conservative 6; Mismatches 6; Indels 4; Gaps 4; 

Db 571 CH - NTQG - SYVCECPPGFSGMDCEED INDCLA 600 

II : II :| I I Mill I::: I II: 
Qy 2 CHI SDQGEPY -CLCQPGFSGEHCQQE - NPCLG 31 



Search completed: Fri May 28 C 
Job time : 21 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

lrch_pp protein - protein database search, using Smith-Waterman algorithm 



Run on; Fri May 28 08 

Tabular output not generated. 



:59:48 1999; MasPar time 5.03 Seconds 

578.731 Million cell updates/sec 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



>OS-09-191-647-6 

(1-103) from US09191647.pep 
813 

1 QCHISDQGEPYCLCQPGFSG GSSFVEEVERHLECGCLACS 103 

PAM 150 
Gap 11 

77977 seqs, 28268293 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 

Database: swiss-prot37 
liswissprot 

Statistics: Mean 37.938; Variance 55.390; scale 0.685 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Result 



Query 



NO. 


Score 


Match Length 


DB ID 


Description 


Pred, No, 


1 


139 


17.1 


1480 


1 SLITJROME 


SLIT PROTEIN PRECURSOR 


8.76e 


13 


2 


129 


15.9 


375 


1 CE10.CHICK 


CEF-10 PROTEIN PRECURS 


l,31e 


10 


3 


121 


14.9 


379 


1 CYR6J10USE 


CYR61 PROTEIN PRECURSO 


6,52e 


09 


4 


116 


14.3 


381 


1 CYR6JUMAN 


CYR61 PROTEIN PRECURSO 


7. lie 


08 


5 


116 


14.3 


5147 


1 FAT.DROME 


CADHERIN-RELATED TUMOR 


7. lie 


08 


6 


113 


13.9 


2139 


1 CRB_DROME 


CRUMBS PROTEIN PRECURS 


2.92e 


07 


7 


113 


13.9 


2437 


1 NOTCJRARE 


NEUROGENIC LOCUS NOTCH 


2.92e 


07 


8 


112 


13.8 


2318 


1 NTC3J0USE 


NEUROGENIC LOCUS NOTCH 


4.65e 


07 


9 


111 


13.7 


2531 


1 NTC1_RAT 


NEUROGENIC LOCUS NOTCH 


7.40e 


07 


10 


111 


13.7 


2531 


1 NTC1J0USE 


NEUROGENIC LOCUS NOTCH 


7.40e 


07 


11 


110 


13.5 


1964 


1 NTC4JOUSE 


NEUROGENIC LOCUS NOTCH 


1.18e 


06 


12 


108 


13.3 


2524 


1 NOTCJENLA 


NEUROGENIC LOCUS NOTCH 


2.95e 


06 


13 


105 


12.9 


2813 


1 VWFJUMAN 


VON WILLEBRAND FACTOR 


1.16e 


05 


14 


105 


12.9 


3707 


1 PGBMJOUSE 


BASEMENT MEMBRANE "SPEC 


1.16e 


05 


15 


105 


12.9 


4543 


1 LRP1.CHICK 


LOW-DENSITY LIPOPROTEI 


1.16e 


05 


16 


103 


12.7 


2415 


1 PGCAJUMAN 


AGGRECAN CORE PROTEIN 


2.84e 


05 


17 


103 


12.7 


2703 


1 NOTCJROME 


NEUROGENIC LOCUS NOTCH 


2.84e 


05 


18 


102 


12.5 


570 


1 FBP3.STRPU 


FIBROPELLIN C PRECURSO 


4.44e 


05 


19 


102 


12.5 


603 


1 FA12.CAVPO 


COAGULATION FACTOR XII 


4.44e 


05 


20 


102 


12.5 


615 


1 FA12JUMAN 


COAGULATION FACTOR XII 


4.44e 


05 


21 


102 


12.5 


2444 


1 NTC1JUMAN 


NEUROGENIC LOCUS NOTCH 


4.44e 


05 


22 


101 


12.4 


1056 


1 MUC5JUMAN 


TRACHEOBRONCHIAL MUCIN 


6.92e 


05 


23 


101 


12.4 


4393 


1 PGBMJUMAN 


BASEMENT MEMBRANE "SPEC 


6,92e 


05 



24 


100 12 


.3 723 


1 


DLL1JUMAN 


DELTA-LIKE PROTEIN 1 P 


1.08e 


04 


25 


99 12 


,2 427 


1 


MFGM_RAT 


MILK FAT GLOBULE -EGF F 


L.67e 


04 


26 




.8 383 


1 


DLK.HUMAN 


DELTA-LIKE PROTEIN PRE 


j,14e 


04 


27 


96 r 


,8 463 


1 


MFGM.MQUSE 


MILK FAT GLOBULE -EGF F 


6,14e 


04 


28 


96 i: 


.8 3672 


1 


LML2.CAEEL 


LAMININ-LIKE PROTEIN K 


6.14e 


04 


29 




.8 3712 


1 


LMA.DROME 


LAMININ ALPHA CHAIN PR 


j.l4e 


04 


30 


95 i: 


.7 385 


1 


DLKJiOUSE 


DELTA-LIKE PROTEIN PRE 


9.44e 


04 


31 


95 11 


.7 880 


1 


DL.DROME 


NEUROGENIC LOCUS DELTA 


9.44e 


04 


32 


95 11 


.7 1064 


1 


FBP1.STRPU 


FIBROPELLIN I PRECURSO 


9.44e 


04 


33 


94 11 


.6 84 


1 


HBGF_PIG 


HEPARIN-BINDING EGF-LI 


1.45e 


03 


34 


94 11 


.6 208 


1 


HBGFJUMAN 


HEPARIN-BINDING EGF-LI 


1.45e 


03 


35 




.6 208 


1 


HBGF.CERAE 


HEPARIN-BINDING EGF-LI 


l,45e 


03 


36 


92 i: 


.3 611 


1 


lmIcanfa 


E-SELECTIN PRECURSOR ( 


3.37e 


03 


37 


92 11 


.3 1429 


1 


LI12.CAEEL 


LIH-12 PROTEIN PRECURS 


3.37e 


03 


38 


92 11 


.3 4544 


1 


LRP1JUMAN 


LOW -DENSITY LIPOPROTEI 


3,37e 


03 


39 


91 11 


.2 121 


1 


TGFAJACMU 


TRANSFORMING GROWTH FA 


5.12e 


03 


40 


91 11 


.2 160 


1 


TGFAJIG 


TRANSFORMING GROWTH FA 


5.12e 


03 


41 


91 11 


.2 160 


1 


TGFAJOMAN 


TRANSFORMING GROWTH FA 


5,12e 


03 


42 


91 11 


.2 714 


1 


DLLlJAT 


DELTA-LIKE PROTEIN 1 P 


5.12e 


03 


43 


91 11 


.2 722 


1 


DLL1.MOUSE 


DELTA-LIKE PROTEIN 1 P 


5.12e 


03 


44 


91 11 


.2 1168 


1 


LMB3JOUSE 


LAMININ BETA- 3 CHAIN P 


5,12e 


03 


45 


91 11 


.2 1955 


1 


AGRI.CHICK 


AGRIN PRECURSOR. 


5,12e 


03 



SLITJROME STANDARD; PRT; 1480 AA, 
P24014; 

01-MAR-1992 (REL. 21, CREATED) 
01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 
01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 
SLIT PROTEIN PRECURSOR. 
SLI. 

DROSOPHILA MELANOGASTER (FRUIT FLY) . 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA, 
[1] 

SEQUENCE FROM N.A, 
MEDLINE; 91099665, 

ROTHBERG J.M,, JACOBS J.R., GOODMAN C.S., ARTAVANIS ■ TSAKONAS S.; 

"Slit: an extracellular protein necessary for development of midline ■ 
glia and commissural axon pathways contains both EGF and LRR 
domains,"; 

GENES DEV. 4:2169-2187(1990). 

-!- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

MATRIX MOLECULES, 
•!• TISSUE SPECIFICITY; EXCRETED BY THE MIDLINE GLIA CELLS AND 

EVENTUALLY DISTRIBUTED ALONG THE AXONS. 
-I- ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

BY 11 AA AT THE C'TERMINUS OF THE LAST EGF REPEAT. 
-I- SIMILARITY: CONTAINS 7 EGF-LIKE DOMAINS. 
-!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

MANY PROTEINS. NUMBER IN THIS PROTEIN: 22. TWO BLOCK OF 6 LRR'S 

AND TWO BLOCKS OF 5 LRR'S. 
-!- SIMILARITY: CONTAINS A C-TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK), 

This SWISS-PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch). 

EMBL; X53959; G8615; -. 
PIR; A36665; A36665. 
FLYBASE; FBgn0003425; sli. 
PROSITE; PS00010; ASXJYDROXYL; 3. 
PROSITE; PS00022; BGF_1; 7. 
PROSITE; PS01185; CTCK_1; 1. 
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DR PROSITE; PS01186; EGF_2; 5. 

DR PROSITE; PS01187; EGF.CA; 2, 

DR PROSITE; PS01225; CTCK 2; 1, 

DR PFAM; PF00007; Cysjtnot; 1. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PF00054; laminin G; 1. 

DR PFAM; PF0Q560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUC I NE - REPEAT ; DUPLICATION. 



FT 


SIGNAL 


I 


36 














FT 




70 


104 


WNobKVtU N'rLANKlNb KWjIUN OF THE LRR. 




DOMAIN 


105 




LEUCINE'RICH REPEATS (1ST REGION). 


FT 


DOMAIN 




294 


CONSERVED C-FLANKING REGION OF THE LRR. 


FT 


DOMAIN 


295 


326 


AAMCt?D\/1?A Xl-UT MJVTVH DEVTAM AC Tlir TDD 




UUflnlH 






LEUCINE'RICH REPEATS (2ND REGION). 


ft 


rVYMMM 






CONSERVED C-FLANKING REGION OF THE LRR. 


FT 


UUHnlN 


519 


550 


CONSERVED N-FLANKING REGION OF THE LRR, 


FT 


FVlVATN 


551 


653 


LEUCINE'RICH REPEATS (3RD REGION). 




UvMnlH 






CONSERVED C'FLANKING REGION OF THE LRR, 


■ 




715 


746 


L.UN0LKVI1U N rLANKlNlj KMjIUN OF THE LRR, 




DOMAIN 


747 


848 


T FriPTMF-BTPH DFDFATO MTU OWTANM 




DOMAIN 


849 


910 


rftMCFDVFA C-VT RMVTMA DETTAM AT? TUU TOD 


FT 


RFPFAT 


105 


115 


TDD 


FT 


REPEAT 


116 


139 




FT 


RFPFAT 


140 


163 


IPC 1 




REPEAT 








FT 


RFPFAT 


188 


211 


IRR 1^' 


FT 


REPEAT 


212 


230 


IRR 










TPR 11 


FT 


REPEAT 


338 


361 


TDD 1l' 

LRR / * 2 , 




REPEAT 




385 


LRR 2 '3. 


FT 


DFDFAT 






LRR 2-4. 


FT 


REPEAT 


410 


433 


LRR 2 - 5. 


FT 


REPEAT 


434 


452 


LRR 2*6, 




DFDF1T 








FT 


REPEAT 


563 


586 


IRR V?' 




RFPFAT 


587 


610 


TDD it' 


FT 


RFPFAT 


611 


634 




FT 


REPEAT 


635 


653 


IRR 


FT 


REPEAT 


747 


757 


IRR 4-1 ' 
TDD !o 




REPEAT 


758 


781 




FT 


REPEAT 


782 


805 




FT 


REPEAT 


806 


829 






RFPFAT 


830 


848 


LRR 4*5 


FT 


DOMAIN 


907 


944 


w f-t Tirr 1 


FT 


DOMAIN 


946 


983 




FT 


DOMAIN 


985 


1022 




■ 


DOMAIN 


1024 


1062 






DOMAIN 


1064 


1100 


FGF-IJKF ^ CAI.T'TrTV-P.TMnTNfi fPfiTFHTTATl 




DOMAIN 


1111 


1149 


FGF-I.TKF fi 


FT 


DOMAIN 


1353 


1392 


EGF-LIKE 7. 


FT 


DOMAIN 


1409 


1480 


CTCK, 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM). 


FT 


CARBOHYD 


111 


111 


POTENTIAL. 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


FT 


CARBOHYD 


435 


435 


POTENTIAL, 


FT 


CARBOHYD 


783 


783 


POTENTIAL. 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


FT 


CARBOHYD 


998 


998 


■ POTENTIAL. 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL. 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL. 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL, 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISULFID 


916 


932 


BY SIMILARITY. 


FT 


DISULFID 


934 


943 


BY SIMILARITY, 


FT 


DISULFID 


950 


961 


BY SIMILARITY, 


FT 


DISULFID 


955 


971 


BY SIMILARITY. 



FT 


DISULFID 


973 


982 


BY 


SIMILARITY. 


FT 


DISULFID 


989 


1001 


BY 


SIMILARITY. 


FT 


DISULFID 


995 


1010 


BY 


SIMILARITY, 


FT 


DISULFID 


1012 


1021 


BY 


SIMILARITY, 


FT. 


UlbULt ID 


1028 


1041 


BY 


SIMILARITY, • 


FT 


DISULFID 


1035 


1050 


BY 


SIMILARITY. 


FT 


DISULFID 


1052 


1061 


BY 


SIMILARITY, 


FT 


DISULFID 


1068 


1079 


BY 


SIMILARITY. 


FT 


DISULFID 


1073 


1088 


BY 


SIMILARITY. 


FT 


DISULFID 


1090 


1099 


BY 


SIMILARITY. 


FT 


DISULFID 


1115 


1125 


BY 


SIMILARITY, 


FT 


DISULFID 


1120 


1137 


BY 


SIMILARITY. 


FT 


DISULFID 


1139 


1148 


BY 


SIMILARITY. 


FT 


DISULFID 


1357 


1368 


BY 


SIMILARITY. 


FT 


DISULFID 


1362 


1380 


BY 


SIMILARITY, 


FT 


DISULFID 


1382 


1391 


BY 


SIMILARITY, 


FT 


DISULFID 


1409 


1443 


BY 


SIMILARITY, 


FT 


DISULFID 


1423 


1457 


BY 


SIMILARITY. 


FT 


DISULFID 


1434 


1473 


BY 


SIMILARITY. 


FT 


DISULFID 


1438 


1475 


BY 


SIMILARITY. 


FT 


DISULFID 


1442 


1479 


BY 


SIMILARITY. 


SQ 


SEQUENCE 


1480 AA; 165752 MW; 


2CD1C421 CRC32; 


Query Match 




17.1%; 


Score 139; DB 1; L 



Best Local Similarity 35.7%; Pred. No. 8,76e-13; 
Matches 15; Conservative 9; Mismatches 18; Indels 0; Gaps 0; 

Db 1434 CVGGCGNQCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRKCGC 1475 

I Mil III : :||| I:: :: ::: III 
Qy 58 CRGGCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 2 

ID CE10.CHICK STANDARD; PRT; 375 AA. 

AC P19336; 

DT 01-NOV-1990 (REL. 16, CREATED) 

DT 01-NOV-1990 (REL. 16, LAST SEQUENCE UPDATE) 

DT 01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 

DE CEF-10 PROTEIN PRECURSOR, 

OS GALLUS GALLUS (CHICKEN) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLI FORMES; PHASIANIDAE; PHASIANINAE; GALLUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 89145206. 

RA SIMMONS D.L,, LEVY D.B., YANNONI Y., ERIKSON R.L.; 

RT "Identification of a phorbol ester-repressible v-src-inducible gene,"; 

RL PROC, NATL, ACAD. SCI. U.S.A. 86:1178-1182(1989). 

CC -I- FUNCTION: PROBABLE SECRETED REGULATORY PROTEIN. 

CC ■!■ INDUCTION: BY V-SRC. 

CC •!■ SIMILARITY: BELONGS TO THE INSULIN-LIKE GROWTH FACTOR BINDING 
CC PROTEIN FAMILY. CEF ■ 1 0/C YR6 1/CTFG/FISP -1 2/NOV PROTEIN SUBFAMILY. 

CC -!- SIMILARITY: CONTAINS 1 VWFC DOMAIN, 

CC -!- SIMILARITY; CONTAINS 1 C-TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK), 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://ww,isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; J04496; G211436; -. 

DR PIR; A41428; A41428. 

DR PROSITE; PS00222; IGFJINDING; 1, 

DR PROSITE; PS01185; CTCK.1; 1. 

DR PROSITE; PS01225; CTCKJ; 1. 

DR PROSITE; PS01208; VWFC; 1, 

DR PFAM; PF00007; Cysjtnot; 1. 

DR PFAM; PF00090; tsp_l; 1, 

DR PFAM; PF00093; WC; 1. 

DR PFAM; PF00219; IGFBP; 1. 
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KW GROWTH FACTOR BINDING; SIGNAL , 



FT 


SIGNAL 




22 




FT 


CHAIN 


23 


375 


CEF-10 PROTEIN. 


FT 


DOMAIN 


98 


164 


VWFC. 


FT 


DOMAIN 


281 


355 


CTCK, 


FT 


DISOLFID 


281 


318 


BY SIMILARITY. 


FT 


DISULFID 


298 


332 


BY SIMILARITY, 


FT 


DISULFID 


309 


348 


BY SIMILARITY. 


FT 


DISOLFID 


312 


350 


BY SIMILARITY, 


FT 


DISULFID 


317 


354 


BY SIMILARITY , 


SQ 


SEQUENCE 


375 AA; 


40651 MW; 68B4BC92 CRC32 


Query Match 




15.9%; 


Score 129; DB 1; 



Best Local Similarity 33,31; Pred. No. 1.31e-10; 
Matches 19; Conservative 11; Mismatches 25; Indels 



# 



KA 

f 

CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 



295 YAGCSSVKKYRPKYC-GSCVDGRCCTPQQTRTVKIRFRCDDGETFTKSVMMIQSCRC 350 

11:1:: I : I hi :l! I ::: I hi II :| I I I 
44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



PRT; 379 AA. 



LT 3 
CYR6.MOUSE 
P18406; 

01-NOV-1990 (REL. 16, CREATED) 

01-NOV-1990 (REL. 16, LAST SEQUENCE UPDATE) . 

15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

CYR61 PROTEIN PRECURSOR (3CH61), 

IGFBP10 OR CYR61. 

MUS MUSCULUS (MOUSE) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

[1] 

SEQUENCE FROM N.A. 
STRAIN-BALB/C; TISSUE-FIBROBLAST; 
MEDLINE; 90287146. 

O'BRIEN T.P., YANG G.P., SANDERS L,, LAU L.F.; 

'Expression of cyr61, a growth factor- inducible immediate-early 



RT gene." 



MOL. CELL. BIOL, 10:3569-3577(1990). 
[2] 

SEQUENCE FROM N.A. 

STRAIN-AJ; TISSUE-EMBRYONIC FIBROBLAST; 
MEDLINE; 91288203, 

LATINKIC B.V., O'BRIEN T.P., LAU L.F.; 

"Promoter function and structure of the growth factor -inducible 
immediate early gene cyr61.\ 
NUCLEIC ACIDS RES. 19:3261-3267(1991). 

-!• FUNCTION: MAY ACT AS ONE OF THE MANY GROWTH FACTOR-BINDING 
PROTEINS; PROMOTES PROLIFERATION, MIGRATION AND ADHESION. 

-!- TISSUE SPECIFICITY: LOW IN KIDNEY, ADRENAL GLAND, TESTES, BRAIN, 
AND OVARY, MODERATE IN HEART, UTERUS, AND SKELETAL MUSCLE, HIGHEST 
IN LUNG. 

-!- DEVELOPMENTAL STAGE; EXPRESSED FROM G(0)/G(1) THROUGH MID-G(l) IN 

NORMAL CELLS, AND AT A CONSTANT LEVEL IN RAPIDLY GROWING CELLS, 
-!• INDUCTION: BY GROWTH FACTORS. 

-!• SIMILARITY : BELONGS TO THE INSULIN-LIRE GROWTH FACTOR BINDING 

PROTEIN FAMILY. CEF-10/CYR61/CTFG/FISP-12/NOV PROTEIN SUBFAMILY. 
-!- SIMILARITY: CONTAINS 1 VWFC DOMAIN. 

-I- SIMILARITY: CONTAINS 1 C -TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch), 

EMBL; M32490; G309206; -. 
EMBL; X56790; G50633; -, 
PIR; A35669; A35669. 
MGD; MGI: 88613; IGFBP10. 



PROSITE; PS00222; IGFJINDING; 1. 
PROSITE; PS01185; CTCKJ; 1. 
PROSITE; PS01225; CTCKJ; 1. 
PROSITE; PS01208; VWFC; 1, 
PFAM; PF00007; Cysjcnot; 1, 
PFAM; PF00090; tspj; 1, 
PFAM; PF00093; vwc; 1, 
PFAM; PF00219; IGFBP; 1. 
GROWTH FACTOR BINDING; SIGNAL. 



P SIGNAL 


1 


24 


POTENTIAL. 


r CHAIN 


25 


379 


CYR61 PROTEIN. 


r DOMAIN 


98 


164 


VWFC. 


r DOMAIN 


284 


358 


CTCK. 


r DISULFID 


284 


321 


BY SIMILARITY. 


I DISULFID 


301 


335 


BY SIMILARITY. 


r DISULFID 


312 


351 


BY SIMILARITY. 


r DISULFID 


315 ■ 


353 


BY SIMILARITY. 


r DISULFID 


320 


357 


BY SIMILARITY. 


J SEQUENCE 


379 AA; 


41709 MW; 116B80C7 CRC32; 


Query Match 




14.9%; 


Score 121; DB 1; 



Best Local Similarity 33.3*; Pred. No. 6.52e-09; 
Matches 19; Conservative 11; Mismatches 25; Indels 2; Gaps 2; 

Db 298 YAGCSSVKKYRPKYC-GSCVDGRCCTPLQTRTVKMRFRCEDGEMFSKNVMMIQSCKC 353 

Ihh: I : I hi :|| I ::: I hi II I :| II 
Qy 44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 4 

ID CYR6JUMAN STANDARD; PRT; 381 AA, 

AC 000622; 014934; 

DT 15-JUL-1998 (REL. 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CYR61 PROTEIN PRECURSOR (GIG1 PROTEIN) (INSULIN-LIKE GROWTH FACTOR- 

DE BINDING PROTEIN 10) , 

GN IGFBP10 OR CYR61 OR GIGl. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC PRIMATES; CATARRHINI; H0MINIDAE; HOMO, 
RN [1] 

RP SEQUENCE FROM N.A. 

ALBRECHT C, VON DER RAMMER H., KLAUDINY J., MAYHAUS M., NITSCH R.M.; 
SUBMITTED (JUN-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 
[2] 

SEQUENCE FROM N.A, 
MEDLINE; 97280750, 

JAY P., BERGE-LEFRANC J.L., MARSOLLIER C, MEJEAN C, TAVIAUX S., 
BERTA P.; 

"The human growth factor-inducible immediate early gene, CYR61, maps 
to chromosome lp.\- 
ONCOGENE 14:1753-1757(1997). 
[3] 

SEQUENCE FROM N.A. 
TISSUE-PLACENTA; 
KOLESNIKOVA T.V., LAU L.F.; 

SUBMITTED (JUN-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 
[4] 

SEQUENCE FROM N.A. 
BI A.B., YU L.; 

SUBMITTED (NOV-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 
•!- FUNCTION; MAY ACT AS ONE OF THE MANY GROWTH FACTOR-BINDING 
PROTEINS; PROMOTES PROLIFERATION, MIGRATION AND ADHESION (BY 
SIMILARITY), 

-!- SIMILARITY: BELONGS TO THE INSULIN-LIKE GROWTH FACTOR BINDING 

PROTEIN FAMILY. CEF-10/CYR61/CTFG/FISP-12/NOV PROTEIN SUBFAMILY. 
■!■ SIMILARITY: CONTAINS 1 VWFC DOMAIN. 

-!■ SIMILARITY: CONTAINS 1 C- TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 



RA 



RC 



RA 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 
CC the European Bioinformatics Institute. There are no restrictions on its 
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CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http; //www, isb-sib.cn/announce/ 

CC or send an email to licenseSisb-sib.ch). 

cc 



EMBL; Y12084; E311857; -. 
EMBL; 062015; G2130527; -. 
EMBL; AF003594; G2196782; -. 
EMBL; AF031385; G2606094; -. 
MIM; 602369; •. 

PROSITE; PS00222; IGFJINDING; 
PROSITE; PS01185; CTCK_1; 1. 
PROSITE; PS01225; CTCK 2; 1. 
PROSITE; PS01208; VWFC; 1. 
PFAM; PF00007; Cysjtnot; 1. 
PFAM; PF00090; tsp.l; 1. 
PFAM; PF00093; VWC; 1. 
PFAM; PF00219; IGFBP; 1. 
GROWTH FACTOR BINDING; SIGNAL, 



I 

FT 
FT 
FT 
FT 
FT 
FT 

so 



Best Local Similarity 33.3%; Pred. No. 7 .lle-08; 
Matches 19; Conservative 10; Mismatches 26; Indels 2; Gaps 

Db 300 YAGCLSVKKYRPKYC-GSCVDGRCCTPQLTRTVKMRFRCEDGETFSKNVMMIQSCKC 355 

11:1 : I : I h! :!l I :: I 1:1 II :| :| II 
0y 44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



P SIGNAL 


1 


24 


POTENTIAL. 


' CHAIN 


25 


381 


CYR61 PROTEIN. 


\ DOMAIN 


98 


164 


VWFC. 


f DOMAIN 


286 


360 


CTCK. 


1 DISULFID 


286 


323 


BY SIMILARITY. 


r DISULFID 


303 


337 


BY SIMILARITY. 


r DISULFID 


314 


353 


BY SIMILARITY. 


r DISULFID 


317 


355 


BY SIMILARITY. 


r DISULFID 


322 


359 


BY SIMILARITY. 


r CONFLICT 


210 


210 


L -> I (IN REF. 4). 


r CONFLICT 


220 


220 


L -> R (IN REF. 4). 


} SEQUENCE 


381 AA; 


42026 MW; 2B091D9E CRC32; 


Query Match 




14.3%; 


Score 116; DB 1; L 



oc 



STANDARD; PRT; 5147 AA. 



RESULT 5 
ID FAT DROME 
P33450; 

01-FEB-1994 (REL. 28, CREATED) 
01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 
CADHERIN-RELATED TUMOR SUPPRESSOR PRECURSOR (FAT PROTEIN) , 
GN FT, 

A DROSOPHILA MELANOGASTER (FRUIT FLY) . 

H EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA, 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 92069752. 

MAHONEY P. A., WEBER U., ONOFRECHUK P., BIESSMANN H., BRYANT P.J., 
GOODMAN C.S.; 

"The fat tumor suppressor gene in Drosophila encodes a novel member 
of the cadherin gene super family."; 
CELL 67:853-868(1991), 

-!- FUNCTION: COULD FUNCTION AS A CELL-ADHESION PROTEIN. 
-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 
•I- DISEASE: RECESSIVE LETHAL MUTATIONS IN FAT CAUSE HYPERPLASTIC, 
TUMOR- LIKE OVERGROWTH OF LARVAL IMAGINAL DISCS, DEFECTS IN 
DIFFERENTIATION AND MORPHOGENESIS, AND DEAT DURING THE PUPAL 
STAGE. 

-!- SIMILARITY: BELONGS TO THE CADHERIN FAMILY. 
-!- SIMILARITY; CONTAINS 37 CADHERINS-TYPE REPEATS, 
-!- SIMILARITY: CONTAINS 5 EGF-LIKE DOMAINS, 
•!• SIMILARITY: CONTAINS 2 LAMININ G-LIKE DOMAINS, 



This SWISS-PROT entry is copyright. It is produced through a collaboration 



CC 


between 


the Swis 


s Institute of Bioinformatics and the embl outstation - 


CC 


the Eurof 


ean Bioinformatics Institute. There are no restrictions on its 


CC 


use by 


non-profit institutions as long as its content is in no way 


CC 


modified and this statement is not removed. Usage by and for commercial 


CC 


entities requires a license agreement (See http://www.isb-sib,ch/announce/ 


CC 
CC 
DR 
DR 


or send s 


n email to license@isb-sib.ch) . 


EMBL; M80537; G157409; -. 
PIR; A41087; IJFFTM. 




DR 


FLYBASE; 


FBgn0001075; ft. 




DR 


PROSITE; 


PS00232; CADHERIN; 22. 


DR 


PROSITE; 


PSQQQ22; 


EGF.l; 4, 




DR 


PROSITE; 


PS01186; BGFJ; 2. 




DR 


PFAM; PF00008; EGF; 4. 




DR 


PFAM; PF 


0028; cadherin; 33. 






PFAM; PF00054; laminin.G; 1. 




DR 


HSSP; P00740; 1IXA. 




KW 


CELL ADHESION; SIGNAL; TRANSMEMBRANE; CYTOSKELETON; GLYCOPROTEIN; 


RW 


CALCIUM-BINDING; 


REPEAT; EGF-LIKE DOMAIN. 


FT 


SIGNAL 


1 


35 


POTENTIAL. 


FT 


CHAIN 


36 


5147 


CADHERIN-RELATED TUMOR SUPPRESSOR, 




DOMAIN 


36 


4583 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


4584 


4609 


POTENTIAL. 


FT 


DOMAIN 


4610 


5147 


CYTOPLASMIC (POTENTIAL), 




REPEAT 


36 


156 


CADHERIN 1. 


FT 


REPEAT 


157 


270 


CADHERIN 2. 


FT 


REPEAT 


271 


382 


CADHERIN 3. 


FT 


REPEAT 


383 


494 


CADHERIN 4. 


FT 


REPEAT 


495 


599 


CADHERIN 5. 


FT 


REPEAT 


600 


708 


CADHERIN 6. 




REPEAT 


709 


820 


CADHERIN 7. 


FT 


REPEAT 


821 


942 


CADHERIN 8. 




REPEAT 


943 


1049 


CADHERIN 9. 


FT 


REPEAT . 


1050 


1153 


CADHERIN 10. 




REPEAT 


1154 


1278 


CADHERIN 11. 




REPEAT 


1279 


1384 


CADHERIN 12. 


m 
1* 


REPEAT 


1385 


1489 


CADHERIN 13 , 




REPEAT 


1490 


1601 


CADHERIN 14 . 


FT 


REPEAT 


1602 


1713 


CADHERIN 15, 




REPEAT 


1714 


1823 


CADHERIN 16 , 


f™ 


REPEAT 


1824 


1922 


CADHERIN 17. 




REPEAT 


1923 


2027 


CADHERIN 18. 


FT 


REPEAT 


2028 


2167 


CADHERIN 19. 


FT 

FT 


REPEAT 


2168 


2278 


CADHERIN 20. 




REPEAT 


2279 


2385 


CADHERIN 21. 


FT 


REPEAT 


2386 


2491 


CADHERIN 22. 


FT 


REPEAT 


2492 


2596 


CADHERIN 23 . 


FT 


REPEAT 


2597 


2703 


CADHERIN 24. 


FT 


REPEAT 


2704 


2810 


CADHERIN 25. 


FT 


REPEAT 


2811 


2913 


CADHERIN 26. 


FT 


REPEAT 


2914 


3013 


CADHERIN 27. 




REPEAT 


3014 


3124 


CADHERIN 28. 


FT 


REPEAT 


3125 


3229 


CADHERIN 29 . 


FT 


REPEAT 


3230 


3334 


CADHERIN 30, 


FT 


REPEAT 


3335 


3439 


CADHERIN 31, 


FT 


REPEAT 


3440 


3545 


CADHERIN 32, 




REPEAT 


3546 


3651 


CADHERIN 33, 


FT 

it 


REPEAT 


3652 


3756 


CADHERIN 34 . 




DOMAIN 


3950 


4011 


EGF-LIKE 1. 


FT 


DOMAIN 


4013 


4049 


EGF-LIKE 2. 


FT 


DOMAIN 


4052 


4090 


EGF-LIKE 3. 


FT 


DOMAIN 


4092 


4128 


EGF-LIKE 4. 




DOMAIN 


4321 


4362 


EGF-LIKE 5, 


FT 


DISULFID 


3954 


3966 


BY SIMILARITY. 


FT 


DISULFID 


3960 


3999 


BY SIMILARITY. 


FT 


DISULFID 


4001 


4010 


BY SIMILARITY. 


FT 


DISULFID 


4017 


4028 


BY SIMILARITY. 


FT 


DISULFID 


4022 


4037 


BY SIMILARITY. 


FT 


DISULFID 


4039 


4048 


BY SIMILARITY. 


FT 


DISULFID 


4056 


4067 


BY SIMILARITY. 


FT 


DISULFID 


4061 


4078 


BY SIMILARITY. 


FT 


DISULFID 


4080 


4089 


BY SIMILARITY. 


FT 


DISULFID 


4096 


4107 


BY SIMILARITY. - 
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FT 


DISULFID 


4101 


4116 


BY SIMILARITY. 


FT 


DISOLFID 


4118 


4127 


BY SIMILARITY. 


FT 


DISULFID 


4325 


4341 


BY SIMILARITY, 


FT 


DISULFID 


4334 


4350 


BY SIMILARITY, 


FT 


DISULFID 


4352 


4361 


BY SIMILARITY. 


FT 


CARBOHYD 


239 


239 


POTENTIAL. 


FT 


CARBOHYD 


257 


257 


POTENTIAL, 


FT 


CARBOHYD 


276 


276 


POTENTIAL. 


FT 


CARBOHYD 


280 


280 


POTENTIAL. 


FT 


CARBOHYD 


402 


402 


POTENTIAL, 


FT 


CARBOHYD 


461 


461 


POTENTIAL, 


FT 


CARBOHYD 


605 


605 


POTENTIAL, 


FT 


CARBOHYD 


631 


631 


POTENTIAL, 


FT 


CARBOHYD 


1155 


1155 


POTENTIAL. 


FT 


CARBOHYD 


1367 


1367 


POTENTIAL. 


FT 


CARBOHYD 


1458 


1458 


POTENTIAL. 


nm 


CARBOHYD 


1751 


1751 


POTENTIAL. 


I 


CARBOHYD 


1831 


1831 


POTENTIAL. 


w 


CARBOHYD 


1880 


1880 


POTENTIAL. 


FT 


CARBOHYD 


2080 


2080 


POTENTIAL. 


FT 


CARBOHYD 


2171 


2171 


POTENTIAL, 


FT 


CARBOHYD 


2247 


2247 


POTENTIAL. 


FT 


CARBOHYD 


2290 


2290 


potential; 


FT 


CARBOHYD 


2437 


2437 


POTENTIAL. 


FT 


CARBOHYD 


2581 


2581 


POTENTIAL, 


FT 


CARBOHYD 


2799 


2799 


POTENTIAL. 


FT 


CARBOHYD 


2920 


2920 


POTENTIAL, 


FT 


CARBOHYD 


2946 


2946 


POTENTIAL. 


FT 


CARBOHYD 


2967 


2967 


POTENTIAL, 


FT 


CARBOHYD 


3167 


3167 


POTENTIAL. 


FT 


CARBOHYD 


3303 


3303 


POTENTIAL. 


FT 


CARBOHYD 


3386 


3386 


POTENTIAL. 


FT 


CARBOHYD 


3389 


3389 


POTENTIAL. 


FT 


CARBOHYD 


3525 


3525 


POTENTIAL, 


FT 


CARBOHYD 


3852 


3852 


POTENTIAL, 


FT 


CARBOHYD 


3865 


3865 


POTENTIAL, 


FT 


CARBOHYD 


3905 


3905 


POTENTIAL. 




CARBOHYD 


4306 


4306 


POTENTIAL. 


FT 


CARBOHYD 


4414 


4414 


POTENTIAL. 


FT 


CARBOHYD 


4471 


4471 


POTENTIAL. 


FT 


CARBOHYD 


4487 


4487 


POTENTIAL. 


FT 


CARBOHYD 


4539 


4539 


POTENTIAL. 


FT 


CARBOHYD 


4550 


4550 


POTENTIAL. 


FT 


VARIANT 


1229 


1229 


S ■> G. 


JT 


VARIANT 


1233 


1233 


G •> S. 




SEQUENCE 


5147 


AA; 564868 MW; 1EF20E13 C 


Query Match 




14.3*; 


Score 116; DB 



Best Local Similarity 43.3%; Pred. No. 7.11e-08; 
Matches 13; Conservative 9; Mismatches 6; Indels 2; 

Db 4067 CQRSPDGSSYFCLCRPGFRGNQCESVSDSC 4096 

I: I :| :| MUM |::|: ::| 
Qy 2 CHISDQGEPY-CLCQPGFSGEHCQQ-ENPC 29 



RESULT 6 

ID CRB.DROME STANDARD; PRT; 2139 AA, . 

AC P10040; 

DT 01-MAR-1989 (REL. 10, CREATED) 

DT 01-MAY-1991 (REL, 18, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL, 37, LAST ANNOTATION UPDATE) 

DE CRUMBS PROTEIN PRECURSOR (95F). 

GN CRB, 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-OREGON-R; TISSUE=EMBRYO; 

RX MEDLINE; 90263104. 

RA ■ TEPASS U., THERES C, KNUST E.; 



RT "Crumbs encodes an EGF-like protein expressed on apical membranes of 

RT Drosophila epithelial cells and required for organization' of 

RT epithelia."; 

RL CELL 61:787-799(1990). 

RN [2] 

RP SEQUENCE OF 1663-1955 FROM N.A. 

RX MEDLINE; 87218537. 

RA KNUST E., DIETRICH U., TEPASS U., BREMER K.A., WEIGEL D,, 

RA VAESSIN H , , CAMPOS -ORTEGA J. A.; 

EGF homologous sequences encoded in the genome of Drosophila 



J. 6:761-766(1987) , 
-!- FUNCTION: MAY PLAY A ROLE IN THE DEVELOPMENT OF EPITHELIA, 
POSSIBLY FOR THE ESTABLISHMENT AND/OR MAINTENANCE OF CELL 
POLARITY. IT MAY ACT AS A SIGNAL. 
-!■ SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 
-!- PTM: PHOSPHORYLATED IN THE CYTOPLASMIC DOMAIN (POTENTIAL). 
-!- SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS . 



RT 

RT melanogaster, and their relation to neurogenic genes."; 

RL — ' " ■ " ~ 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; M33753; G552087; ALT SEQ. 

DR EMBL; X05144; E1746; -. 

DR EMBL; X05144; G929536; -. 

DR PIR; B26637; B26637. 

DR PIR; A35672; A35672. 

DR FLYBASE; FBgn0000368; crb. 

DR PROSITE; PS00010; ASXJYDROXYL; 15. 

DR PROSITE; PS00022; EGF 1; 26. 

DR PROSITE; PS01186; EGF.2; 17. 

DR PROSITE; PS01187; EGF_CA; 15. 

DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00054; laminin G; 3, 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN; SIGNAL; PHOSPHORYLATION. 



FT 


SIGNAL 


1 


90 




FT 


CHAIN 


91 


2139 


CRUMBS PROTEIN. 


■FT 


DOMAIN 


91 


2084 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


2085 


2111 


POTENTIAL. 


FT 


DOMAIN 


2112 


2139 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


267 


303 


EGF-LIKE 1. 


FT 


DOMAIN 


306 


343 


EGF-LIKE 2. 


FT 


DOMAIN 


348 


386 


EGF-LIKE 3. 


FT 


DOMAIN 


388 


425 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


427 


463 


■ EGF-LIKE 5. 


FT 


DOMAIN 


464 


500 


EGF-LIKE 6, 


FT 


DOMAIN 


501 


532 


EGF-LIKE 7. 


FT 


DOMAIN 


545 


581 


EGF-LIKE 8. 


FT 


DOMAIN 


582 


611 


EGF-LIKE 9. 


FT 


DOMAIN 


609 


646 


EGF-LIKE 10. 


FT 


DOMAIN 


648 


685 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


687 


723 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


725 


761 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


763 


800 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


802 


838 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


840 


902 


EGF-LIKE 16. 


FT 


DOMAIN 


904 


940 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


942 


978 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


980 


1021 


EGF-LIKE 19. 


FT 


DOMAIN 


1207 


1243 


EGF-LIKE 20, 


FT 


DOMAIN 


1481 


1517 


EGF-LIKE 21. 


FT 


DOMAIN 


1759 


1795 


EGF-LIKE 22. 


FT 


DOMAIN 


1797 


1833 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT' 


DOMAIN 


1835 


1871 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1874 


1915 


EGF-LIKE 25. 


FT 


DOMAIN 


1915 


1951 


EGF-LIKE 26. 
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II 


LHJMA1N 


1953 


1989 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


1991 


2029 


EGF-LIKE 28, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


2030 


2070 


EGF-LIKE 29. 


FT 


DISOLFID 


271 


282 


BY SIMILARITY. 


FT 


DISOLFID 


276 


291 


BY SIMILARITY. 


FT 


DISULFID 


293 


302 


BY SIMILARITY. 


FT 


DISULFID 


310 


321 


BY SIMILARITY. 


FT 


DISULFID 


315 


331 


BY SIMILARITY, 


FT 


DISULFID 


333 


342 


BY SIMILARITY. 


FT 


DISOLFID 


352 


363 


BY SIMILARITY, 


FT 


DISOLFID 


357 


374 


BY SIMILARITY. 


FT 


DISOLFID 


376 


385 


BY SIMILARITY. 


FT 


DISOLFID 


392 


403 


BY SIMILARITY, 


FT 


DISOLFID 


397 


412 


BY SIMILARITY, 


FT 


DISOLFID 


414 


424 


BY SIMILARITY. 


FT 


DISOLFID 


431 


442 


BY SIMILARITY. 


FT 


DISOLFID 


436 


451 


BY SIMILARITY. 


FT 


DISOLFID 


453 


462 


•BY SIMILARITY. 


FT 


DISOLFID 


468 


479 


BY SIMILARITY. 


FT 


DISOLFID 


473 


488 


BY SIMILARITY. 




DISOLFID 


490 


499 


BY SIMILARITY. 


■ 


DISOLFID 


505 


515 


BY SIMILARITY, 




DlallLr ID 


509 


c?? 


BY blMILARIIY. 




UibULr J.D 


522 


531 


BY SIMILARITY. 


FT 


DISOLFID 


549 


562 


BY SIMILARITY, 


FT 




556 


569 


BY SIMILARITY. 


FT 


DISOLFID 


571 


580 


BY SIMILARITY, 


FT 


DISOLFID 


586 


597 


BY SIMILARITY. 


FT 


DISOLFID 


591 


602 


BY SIMILARITY, 


FT 


DISOLFID 


604 


610 


BY SIMILARITY. 


FT 


DISOLFID 


613 


624 


BY SIMILARITY. 


FT 


DISOLFID 


618 


634 


BY SIMILARITY, 


FT 


DISOLFID 


636 


645 
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Note: remainder of annotations omitted. 

Query Match 13.9%; Score 113; DB 1; Length 2139; 

Best Local Similarity 43.8%; Pred. No. 2.92e-07; 

Matches 14; Conservative 8; Mismatches 8; Indels 2; Gaps 2; 

Db 1812 C-INQVAAFFCQCQPGFEGQHCEQNIDECADQ 1842 

I h: : :| Hill MM: : I I 
Qy 2 CHI SDQGEP YCLCQPGFSGEHCQQE - NPCLGQ 32 



RESULT 7 

ID NOTCJRARE STANDARD; PRT; 2437 AA. 
AC , P46530; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN PRECURSOR, 

GN NOTCH. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACT INOPTERYG I I ; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQOENCE FROM N.A. 

RC TISSUE- EMBRYO; 

RX MEDLINE; 94128602. 

RA BIERKAMP C, CAMPOS -ORTEGA J, A.; 

RT "A zebrafish homologue of the Drosophila neurogenic gene Notch and 

RT its pattern of transcription during early embryogenesis."; 

RL MECH. DEV. 43:87-100(1993). 

CC -I- FUNCTION: IMPLICATED IN CELL FATE SPECIFICATIONS DORING 
CC EMBRYO DEVELOPMENT. MAY BE INVOLVED IN THE FORMATION OF THE 

CC ■ NEORAL PLATE, NOTOCHORD AND BRAIN VESICLES. 
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CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
CC -!- DEVELOPMENTAL STAGE: EXPRESSED IN ALL CELLS IN PREGASTRDLAT ION 
CC STAGES, DURING GASTRULATION IS DIFFERENTIALLY EXPRESSED, 
CC ACCUMULATING PREDOMINANTLY IN THE PRECHORDAL MESODERM AND 
CC NOTOCHORD. AT THE END OF GASTRULATION, EXPRESSED ALONG THE 
CC ANTERIOR- POSTERIOR AXIS INCLUDING THE DEVELOPING NEURAL PLATE 
CC AND DIFFERENTIATING MESODERM. ALSO PRESENT IN THE DEVELOPING 
CC BRAIN AND HEAD REGIONS. 

CC •!• SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 
CC •!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS, 
CC ■!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 
CC •!• SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 
CC the European Bioinformatics Institute, There are no restrictions on its 

•use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; X69088; G433867; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 23. 

DR PROSITE; PS00022; EGF_1 ; 34. 

DR PROSITE; PS01186; EGFJ; 28. 

DR PROSITE; PS01187; EGF CA; 22. 

DR PFAM; PF000Q8; EGF; 36. 

DR PFAM; PF00023; ank; 6, 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 

POTENTIAL. 

NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN. 
EXTRACELLULAR (POTENTIAL) , 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL). 
EGF-LIKE 1. 
EGF-LIKE 2. 
EGF-LIKE 3. 
EGF-LIKE 4. 

EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 
EGF-LIKE 6. 

EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 
EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 10. 

EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 12, CALCIUM- BINDING (POTENTIAL). 
EGF-LIKE 13, CALCIUM- BINDING (POTENTIAL). 
EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 15, CALCIUM" BINDING (POTENTIAL). 
EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 20, CALCIUM- BINDING (POTENTIAL). 
EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 22. 

EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 26, 

EGF-LIKE 27, CALCIUM- BINDING (POTENTIAL). 
EGF-LIKE 28. 
EGF-LIKE 29, 

EGF-LIKE 30, CALCIUM- BINDING (POTENTIAL), 
EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) , 
EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) . 
EGF-LIKE 33. 
EGF-LIKE 34. 
EGF-LIKE 35, 
EGF-LIKE 36. 
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BY SIMILARITY. 
BY SIMILARITY. 
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BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
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Note: remainder of annotations omitted. 

Query Match 13.9%; Score 113; DB 1; Length 2437; 

Best Local Similarity 51.6%; Pred. No. 2.92e-07; 

Matches 16; Conservative 5; Mismatches 7; Indels 3; 

Db 1361 C • VSGHLSPRCLCAPGFSGHECQTRMDSPCL 1390 . 

I :| : I III HIM II : ::lll 
Qy 2 CHISDQGEPYCLCQPGFSGEHCQ-Q-ENPCL 30 



STANDARD; PRT; 2318 AA. 



RESULT 8 

ID NTC3J10USE 

AC Q61982; 

DT Ql-NOV-1997 (REL, 35, CREATED) 

#01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
NEUROGENIC LOCUS NOTCH 3 PROTEIN, 
NOTCH3. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENT IA; SCIUROGNATHI; MURIDAE; MURINAE; MUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-ICR X SWISS WEBSTER; 

RX MEDLINE; 95001556. 

RA LARDELLI M., DALSTRAND J., LENDAHL U.; 

RT "The novel Notch homologue mouse Notch 3 lacks specific epidermal 

rt growth factor -repeats and is expressed in proliferating 

RT neuroepithelium,"; 

RL MECH. DEV. 46:123-136(1994). 

CC •!• FUNCTION: NOTCH 1, 2 AND 3 PLAY A COMBINATIONAL ROLE DURING 
CC VARIOUS CELL FATE DECISIONS AND MORPHOLOGICAL MOVEMENTS IN THE 
CC DEVELOPING CNS AND PROBABLY OTHER REGIONS OF THE EMBRYO. 

CC -I- TISSUE SPECIFICITY: PROLIFERATING NEUROEPITHELIUM, ■ 

CC •!• DEVELOPMENTAL STAGE: CNS DEVELOPMENT, 

CC -I- SIMILARITY: CONTAINS 34 EGF-LIKE DOMAINS. 

CC ■!• SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -I- SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 

cc 



CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http : //www . isb -sib . ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; X74760; G483581; -. 

DR MGD; MGI: 99460; NOTCH3. 

DR PROSITE; PS00010; ASXJYDROXYL; 18. 

DR ' PROSITE; PS00022; BGFJL; 33, 

DR PROSITE; PS0U86; EGF.2; 27. 

DR PROSITE; PS01187; EGF_CA; 17. 

DR PFAM; PF00008; EGF; 33. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3, 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 
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EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 
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lll:||::| III : :||::: 
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ID NTCl.RAT STANDARD; PRT; 2531 AA. 

AC Q07008; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR, 

GN NOTCH1, 

OS RATTUS NORVEGICUS (RAT), 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENT IA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-SCHWANN CELL; 

RX MEDLINE; 92111383. 

RA WEINMASTER G., ROBERTS V.J., LEMKE G.; 

RT "A homolog of Drosophila Notch expressed during mammalian 

RT development/; 

RL DEVELOPMENT 113:199-205(1991). 

CC -!- FUNCTION: REQUIRED FOR THE CORRECT DIFFERENTIATION OF A NUMBER 
CC OF TISSUES. 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- DEVELOPMENTAL STAGE: IN THE EMBRYO, HIGHEST LEVELS OCCUR BETWEEN 
CC DAYS 12 AND 14 AND DECREASE RAPIDLY TO MUCH LOWER LEVELS IN THE 
CC ADULT. 

CC •!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS . 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 
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1101 


1122 


BY SIMILARITY. 


FT 


DISOLFID 


144 


155 


BY SIMILARITY, 


FT 


DISULFID 


1116 


1131 


BY SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

f DISULFID 
DISULFID 
DISULFID 
DISULFID 



1133 1142 

1149 1160 

1154 1169 

1171 1180 

1187 1198 

1192 1207 

1209 1218 

1225 1244 

1238 1253 

1255 1264 

1271 1284 

1276 1293 

1295 1304 

1311 1322 

1316 1334 

1336 1345 

1352 1363 

1357 1372 

1374 1383 

1391 1403 



BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 



note: remainder of annotations omitted. 

Query Match 13.7%; Score 111; DB 1; length 2531; 

Best Local Similarity 35.5%; Pred. No, 7.40e-Q7; 

Matches 11; Conservative 12; Mismatches 7; Indels 1; 

Db 36 RCEVANGTEA-CVCSGAFVGQRCQDPSPCLS 65 

■\ ::: |: hi :| |::||: :|||: 
Oy 1 QCHISDQGEPYCLCQPGFSGEHCQQENPCLG 31 



I 

RA 
RT 
RT 



NTClJiOUSE STANDARD; PRT; 2531 AA. 
Q01705; 

01-NOV-1995 (REL. 32, CREATED) 

01-FEB-1996 (REL. 33, LAST 'SEQUENCE UPDATE) 

15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR (MOTCH PROTEIN) . 

NOTCH1 OR MOTCH. 

MUS M0SCULUS (MOUSE) , 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

RODENTIA; SCIUROGNATHI; MDRIDAE; MURINAE; MUS, 

[1] 

SEQUENCE FROM N.A. 
TISSUE=EMBRYO; 
MEDLINE; 93194170. 

FRANCO DEL AMO F,, GENDRON-MAGUIRE M,, SWIATEK P.J., JENKINS N.A., 
COPELAND N.G., GRIDLEY T.; 

"Cloning, analysis, and chromosomal localization of Notch- 1, a mouse 
homolog of Drosophila Notch."; 
GENOMICS 15:259-264(1993). 
[2] 

SEQUENCE OF 1551-2170 FROM N.A. 

TISSDE-EMBRYO; 

MEDLINE; 93048835. 

FRANCO DEL AMO F., SMITH D.E., SWIATEK P.J., GENDRON-MAGUIRE M., 
GREENSPAN R.J., MCMAHON A. P., GRIDLEY T,; 
"Expression pattern of Motch, a mouse homolog of Drosophila Notch, 
suggests an important role in early postimplantation mouse 
development."; 

DEVELOPMENT 115:737-744(1992). 
-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
-!- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 
-!- SIMILARITY: CONTAINS 36 EGF-LIRE DOMAINS. 
-I- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. . 
-!- SIMILARITY: CONTAINS 6 ANK REPEATS, 
-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 



RT 
RT 
RT 
RL 
CC 

cc 

CC 

cc 
cc 
cc 
cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 
CC the European Bioinformatics Institute. There are no restrictions on its 
CC use. by non-profit institutions as long as its content is in no way 



modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseJisb-sib.chj, 

EMBL; Z11886; G288503; -. 

MGD; MGI: 97363; NOTCHl. 

PROSIIE; PS00010; ASXJYDROXYL; 22. 

PROSITE; PS00022; EGF_1; 34. 

PROSITE; PS01186; EGFJ; 27, 

PROSITE; PS01187; EGF CA; 21. 

PFAM; PF00008; EGF; 35. 

PFAM; PF00023; ank; 6. 

PFAM; PF00066; notch; 3. 

HSSP; P00740; 1IXA, 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. . 



SIGNAL 
CHAIN 
DOMAIN 

TRANSMEM 1726 
DOMAIN 



18 



DOMAIN 
DOMAIN 
DOMAIN 



DOMAIN 
REPEAT 
REPEAT 
REPEAT 
REPEAT 
REPEAT 



19 2531 

19 1725 
1746 

1747 2531 

24 1425 

1449 1462 

1445 1562 

1445 1480 

1481 1522 

1523 1562 

1865 2075 

1865 1910 

1912 1942 

1944 1975 

1978 2009 

2011 2042 



REPEAT 2044 2075 



CARBOHYD 959 959 

CARBOHYD 1179 1179 

CARBOHYD 1241 1241 

CARBOHYD 1489 1489 

FT CARBOHYD 1587 1587 



POTENTIAL. 

NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1, 

EXTRACELLULAR (POTENTIAL). 

POTENTIAL. 

CYTOPLASMIC (POTENTIAL). 
36 X EGF -TYPE REPEATS. 
CYS-RICH, 

3 X LIN/NOTCH REPEATS. 
LIN/NOTCH 1. 
LIN/NOTCH 2, 
LIN/NOTCH 3. 
6 X ANK MOTIF REPEATS. 
ANK MOTIF 1. 
ANK MOTIF 2. 
ANK MOTIF 3. 
ANK MOTIF 4. 
ANK MOTIF 5. 
ANK MOTIF 6. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL, 
POTENTIAL. 
POTENTIAL. 



SQ SEQUENCE 2531 AA; 271312 MW; AD71189B CRC32; 

Query Match 13.7%; Score 111; DB 1; Length 2531; 

Best Local Similarity 24.2%; Pred. No. 7.40e-07; 

Matches 24; Conservative 28; Mismatches 39; Indels 8; Gaps £ 

Db 589 CLCQPGYTGHHCETNINECHSQPCRHGGTCQDRDNSYLCLCLKGTTGPNCEINLDDCASN 648 

II: : I I :| I : |:::| I :: I I |:: 
Qy 12 CLCQPGFSGEHCQQE - NPCLGQWREV- 1 - R - RQKG YAS - CAT ASKVPIMECR -GGCGPQ 65 

Db 649 PCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPC 687 

I:: : ::|: ::: : : :: ||: :| 
Qy 66 CCQPTRSKRRKYVFQCTDGSSFVEEV ■ ERHL- ECGCLAC 102 



RESULT 11 

ID NTC4J10USE STANDARD; PRT; 1964 AA. 

AC P31695; Q62389; 

DT 01-JUL-1993 (REL. 26, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4 PRECURSOR (TRANSFORMING 

DE PROTEIN INT-3). 

GN NOTCH4 OR INT 3 OR INT-3. 

OS MUS MUSCULUS (MOUSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 92194507. 

RA ROBBINS J., BLONDEL B.J., GALLAHAN D., CALLAHAN R,; 

RT "Mouse mammary tumor gene int-3: a member of the notch gene family 



RT transforms mammary epithelial cells." 
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RL J. VIROL. 66:2594-2599(1992). 

RN [2] 

RP REVISIONS, SEQUENCE FROM N. A. 

RA CALLAHAN R.; 

RL SUBMITTED (NOV-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 

RN [3] 

RP SEQUENCE FROM N.A. 

RC TISSUE-LUNG, AND TESTIS; 

RX MEDLINE; 96281668. 

RA UYTTENDAELE H., MARAZZI G., WU G., YAN Q. ( SASSOON D., KITAJEWSKI J.; 

RT "Notch4/int-3, a mammary proto-oncogene, is an endothelial 

RT cell-specific mammalian Notch gene. 1 ; 

RL DEVELOPMENT 122:2251-2259(1996). 

CC -!■ SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- DISEASE: ACTIVATED INT -3 TRANSFORMS MAMMARY EPITHELIAL CELLS, 

CC -!• SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS, 

CC -!■ SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!■ SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 

CC •!• SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

*This SWISS-PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation • 
the European Bioinformatics Institute, There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc - 

DR EMBL; M80456; G1714084; -. 

DR EMBL; 043691; G1401160; -. 

DR PIR; A38072; TVMVT3, 

DR MGD; MGI; 107471; NOTCH4, 

DR PROSITE; PS00010; ASXJYDROXYL; 11, 

DR PROSITE; PS00022; EGF 1; 28. 

DR PROSITE; PS01186; EGF_2; 21. 

DR PROSITE; PS01187; EGF_CA; 9. 

DR PFAM; PF000Q8; EGF; 26, 

DR PFAM; PF00Q23; ank; 6. 

DR PFAM; PF00066; notch; 2. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN; PROTO-ONCOGENE; ANK REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


20 


POTENTIAL. 


FT 


CHAIN 


21 


1964 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4. 


FT 


DOMAIN 


21 


1443 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1444 


1464 


POTENTIAL. 


FT 


DOMAIN 


1465 


1964 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


21 


60 


EGF-LIKE 1. 




DOMAIN 


61 


112 


EGF-LIKE 2. 


1 


DOMAIN 


115 


152 


EGF-LIKE 3. 




DOMAIN 


153 


189 


EGF-LIKE 4. 




DOMAIN 


191 


229 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


231 


271 


EGF-LIKE 6. 


FT 


DOMAIN 


273 


309 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


311 


350 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


352 


388 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


389 


427 


EGF-LIKE 10. 


FT 


DOMAIN 


429 


470 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


472 


508 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


510 


546 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


548 


584 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


586 


622 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


622 


656 


EGF-LIKE 16, 


FT 


DOMAIN 


658 


686 


EGF-LIKE 17, 


FT 


DOMAIN 


688 


724 


EGF-LIKE 18, 


FT 


DOMAIN 


726 


762 


EGF-LIKE 19, 


FT 


DOMAIN 


764 


800 


EGF-LIKE 20, 


FT 


DOMAIN 


803 


839 


EGF-LIKE 21. 


FT 


DOMAIN 


841 


877 


EGF-LIKE 22. 


FT 


DOMAIN 


878 


924 


EGF-LIKE 23. 


FT 


DOMAIN 


926 


962 


EGF-LIKE 24. 


FT 


DOMAIN 


964 


1000 


EGF-LIKE 25. 


FT 


DOMAIN 


1002 


1040 


EGF-LIKE 26. 



FT 


DOMAIN 


1042 


1081 


EGF-LIKE 27, 


FT 


DOMAIN 


1083 


1122 


EGF-LIKE 28, 


FT 


DOMAIN 


1126 


1167 


EGF-LIKE 29. 


FT 


DOMAIN 


1168 


1282 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1168 


1208 


LIN/NOTCH 1. 


FT 


REPEAT 


1209 


1242 


LIN/NOTCH 2. 


FT 


REPEAT 


1243 


1282 


LIN/NOTCH -3 . 


FT 


DOMAIN 


1572 


1785 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1572 


1603 


ANK MOTIF 1, 


FT 


REPEAT 


1622 


1653 


ANK MOTIF 2. 


FT 


REPEAT 


1654 


1685 


ANK MOTIF 3. 


FT 


REPEAT 


1688 


1719 


AWT MOTTF L 


FT 


REPEAT 


1721 


1752 


ANK MOTIF 5. 


FT 


REPEAT 


1754 


1785 


ANK MOTIF 6. 


FT 


DISULFID 


25 


38 


RY CTMTLARTTY 
di 0J.flUunI\lil , 


FT 


DISULFID 


32 


48 


BY SIMILARITY. 


FT 


DISULFID 


50 


59 


BY SIMILARITY. 


FT 


DISULFID 


65 


77 


BY SIMILARITY, 


FT 


DISULFID 


71 


100 


BY SIMILARITY. 


FT 


DISULFID 


102 


111 


BY SIMILARITY. 


FT 


DISULFID 


119 


130 


BY SIMILARITY, 


FT 


DISULFID 


124 


140 


BY SIMILARITY. 


FT 


DISULFID 


142 


151 


BY SIMILARITY. 


FT 


DISULFID 


157 


168 


BY SIMILARITY. 


FT 


DISULFID 


162 


177 


BY SIMILARITY. 


FT 


DISULFID 


179 


188 


BY SIMILARITY, 


FT 


DISULFID 


195 


208 


BY SIMILARITY, 


FT 


DISULFID 


202 


217 


BY SIMILARITY, 


FT 


DISULFID 


219 


228 


BY SIMILARITY. 


FT 


DISULFID 


235 


246 


BY SIMILARITY. 


FT 


DISULFID 


240 


259 


BY SIMILARITY. 


FT 


DISULFID 


261 


270 


BY SIMILARITY. 


FT 


DISULFID 


235 


246 


BY SIMILARITY. 


FT 


DISULFID 


240 


259 


BY SIMILARITY. 


FT 


DISULFID 


261 


270 


BY SIMILARITY. 


FT 


DISULFID 


277 


288 


BY SIMILARITY, 


FT 


DISULFID 


282 


297 


BY SIMILARITY, 


FT 


DISULFID 


299 


308 


BY SIMILARITY. 


FT 


DISULFID 


315 


329 


BY SIMILARITY. 




DISULFID 


323 


338 


BY SIMILARITY. 


FT 


DISULFID 


340 


349 


BY CTMTT.ARTTV 




DISULFID 


356 


367 


BY CTMTT.ARTTV 
DI DiniLinnll I . 


FT 


DISULFID 


361 


376 


RY CTMTT.ARTTY 


FT 


DISULFID 


378 


387 


RY CTMTT ARTTV 






393 




DV CTMTT RDTTV 
DI 3J.MlIjAK.Li I . 


FT 




398 


415 


HV CTMTT.1DTTV 
DI DlHlunnliX . 




DISULFID 


417 


426 


RY CTMTT AfiTTV 
DI Olnlunftlil . 


FT 




433 


449 


RY CTMTT ARTTV 
DI OiHibAK.ni, 




DISULFID 


443 


458 


RY CTMTT.ARTTY 
di oiHlbnftll I , 


FT 


DISULFID 


460 


469 


BY SIMILARITY, 


FT 




476 


487 


RY CTMTT ARTTV 

di oiniijftKiii. 




DISULFID 


481 


496 


BY SIMILARITY, 


FT 


DISULFID 


498 


507 


BY SIMILARITY. 


FT 


DISULFID 


514 


525 


BY SIMILARITY. 


FT 


DISULFID 


519 


534 


RY CTMTr.ARTTY 


FT 


DISULFID 


536 


545 


RY CTMTT.ARTTV 




DISULFID 


552 


563 


RY CTMTT ARTTV 


FT 


DISULFID 


557 


572 


RY CTMTT ARTTV 
DI DilUuAIUil. 


FT 


nTcnr.FTn 


574 


583 


RY CTMTT.ARTTV 
DI OlIUumUH. 




DISULFID 


590 


601 


BY CTMTT 1DTTV 
DI OlMlLinltll I . 


FT 


nTCriTFTD 

UlOUuC LU 


595 


610 


DV CTMTT 1DTTV 

DI OlMlLAKIll , 


FT 




612 


621 


I>V CTMTT 1PTTV 
DI SlftlUAKlil . 




DISULFID 


626 


637 


RY CTMTT ARTTV 
DI OiHlijnftli I , 


FT 


DISULFID 


631 


646 


BY SIMILARITY. 


FT 


DISULFID 


648 


655 


BY SIMILARITY. 


FT 


DISULFID 


662 


669 


BY SIMILARITY. 


FT 


DISULFID 


664 


674 


BY SIMILARITY, 


FT 


DISULFID 


■ 676 


685 


BY SIMILARITY. 


FT 


DISULFID 


692 


703 


BY SIMILARITY. 


FT 


DISULFID 


697 


712 


BY SIMILARITY. 


FT 


DISULFID 


714 


723 


. BY SIMILARITY. 


FT 


DISULFID 


730 


741 


BY SIMILARITY. 


FT 


DISULFID 


735 


750 


BY SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

# DISULFID 
DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 
FT ' DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CONFLICT 

FT CONFLICT 

FT CONFLICT 



752 

768 

773 

790 

807 

812 

829 

845 

850 

867 

882 

897 

914 

930 

935 

952 

968 

973 

990 999 
1006 1019 
1011 1028 
1030 1039 
1046 1057 
1051 1069 
1071 1080 
1087 1098 
1092 1110 
1112 1121 
1130 1142 



761 
779 
788 
799 
818 
827 
838 
856 
865 
876 
903 
912 
923 
941 
950 
961 
979 



1157 
711 
960 



1166 
711 
960 



1139 1139 
43 43 
298 298 



BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
POTENTIAL. 
POTENTIAL, 
POTENTIAL. 
0 -> R (IN REF. 3). 
L •> P (IN REF, 3). 
M ■> K (IN REF, 3), 



Note: remainder of annotations omitted. 

Query Match 13,54; Score 110; DB 1; Length 1964; 

Best Local Similarity 43.81; Pred. No, 1.18e-06; 

Matches 14; Conservative 7; Mismatches 9; Indels 2; 

Db 449 C • INTPGSFNCLCLPGYTGSRCEADHNECLSQ 479 

A |: HI l:: : ' : : > :|: 

H 2 CHISDQGEPYCLCQPGFSGEHCQQE-NPCLGQ 32 



RESULT 12 

ID NOTCJENLA STANDARD; PRT; 2524 AA. 

AC P21783; 

DT 01-MAY-1991 (REL. 18, CREATED) 

DT 01-OCM996 (REL. 34, LAST SEQUENCE UPDATE) 

DT 15-JUV1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG PRECURSOR (XOTCH PROTEIN) . 

GN XOTCH. 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG). 

OC EUKARYOTA ; META20A; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 90385285. 

RA COFFMAN C . , HARRIS W , , KINTNER C . ; 

RT "xotch, the Xenopus homolog of Drosophila notch."; 

RL SCIENCE 249:1438-1441(1990). 

RN [2] 

RP REVISIONS TO 1759-1782, 

RA KINTNER C; 

RL SUBMITTED (JUN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 

CC •!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 



CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANR REPEATS, 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

cc 

DR EMBL; M33874; G1364263; -. 

DR PIR; A35844; A35844. 

DR PROSITE; PS00010; ASXJYDROXYL; 23. . 

DR PROSITE; PS00022; EGF 1; 34. 

DR PROSITE; PS01186; EGFJ; 29. 

DR PROSITE; PS0U87; EGF.CA; 21. 

DR PFAM; PF00008; EGF; 36. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANR REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


CHAIN 


20 


2524 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG. 


FT 


DOMAIN 


20 


1728 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1729 


1750 


POTENTIAL, 


FT 


DOMAIN 


1751 


2524 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


20 


57 


EGF-LIKE 1. 


FT 


DOMAIN 


58 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


140 


EGF-LIKE 3 . 


FT 


DOMAIN 


141 


177 


EGF-LIKE 4 . 


FT 


DOMAIN 


179 


215 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


217 


254 


EGF-LIKE 6. 


FT 


DOMAIN 


256 


292 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


294 


332 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


371 


409 


EGF-LIKE 10, 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


489 


525 


EGF-LIKE 13, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


527 


563 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


565 


600 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


602 


638 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


640 


675 


EGF-LIKE 17. 


FT 


DOMAIN 


677 


713 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


715 


750 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


• DOMAIN 


752 


788 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


790 


826 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


828 


866 


EGF-LIKE 22. 


FT 


DOMAIN 


868 


904 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


906 


942 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


944 


980 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


982 


1018 


EGF-LIKE 26, 


FT 


DOMAIN 


1020 


1056 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


1058 


1094 


EGF-LIKE 28, 


FT 


DOMAIN 


1096 


1142 


EGF-LIKE 29. 


FT 


DOMAIN 


1144 


1180 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1182 


1218 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


1220 


1264 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1266 


1304 


EGF-LIKE 33. 


FT 


DOMAIN 


1306 


1346 


EGF-LIKE 34. 


FT 


DOMAIN 


1347 


1383 


EGF-LIKE 35. 


FT 


DOMAIN 


1386 


1424 


EGF-LIKE 36. 


FT 


DOMAIN 


1441 


1560 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1441 


1478 


LIN/NOTCH 1. 


FT 


REPEAT 


1479 


1520 


LIN/NOTCH 2. 


FT 


REPEAT 


1521 


1560 


LIN/NOTCH 3. 


FT 


DOMAIN 


1871 


2083 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


22 


35 


BY SIMILARITY. 


FT 


DISULFID 


29 


45 


BY SIMILARITY. 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 
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FT 


DISULFID 


62 


74 


BY SIMILARITY . 


FT 


DISULFID 


1029 


1044 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY SIMILARITY. 


FT 


DISULFID 


1046 


1055 


BY SIMILARITY. 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


FT 


DISULFID 


1062 


1073 


BY SIMILARITY. 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


FT 


DISULFID 


1067 


1082 


BY SIMILARITY. 


FT 


DISULFID 


111 


128 


BY SIMILARITY. 


FT 


DISULFID 


1084 


1093 


BY SIMILARITY. 


FT 


DISULFID 


130 


139 


BY SIMILARITY. 


FT 


DISULFID 


1100 


1121 


BY SIMILARITY. 


FT 


DISULFID 


145 


156 


BY SIMILARITY. 


FT 


DISULFID 


1115 


1130 


BY SIMILARITY, 


FT 


DISULFID 


150 


165 


BY SIMILARITY. 


FT 


DISULFID 


1132 


1141 


BY SIMILARITY. 


FT 


DISULFID 


167 


176 


BY SIMILARITY. 


FT 


DISULFID 


1148 


1159 


BY SIMILARITY. 


FT 


DISULFID 


183 


194 


BY SIMILARITY. 


FT 


DISULFID 


1153 


1168 


BY SIMILARITY. 


FT 


DISULFID 


188 


203 


BY SIMILARITY. 


FT 


DISULFID 


1170 


1179 


BY SIMILARITY, 


FT 


DISULFID 


205 


214 


BY SIMILARITY. 


FT 


DISULFID 


1186 


1197 


BY SIMILARITY. 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 


FT 


DISULFID 


1191 


1206 


BY SIMILARITY, 


FT 


DISULFID 


226 


242 


BY SIMILARITY, 


FT 


DISULFID 


1208 


1217 


BY SIMILARITY. 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT. 


DISULFID 


1224 


1243 


BY SIMILARITY, 


FT 


DISULFID 


260 


271 


BY SIMILARITY. 


FT 


DISULFID 


1237 


1252 


BY SIMILARITY. 


FT 


DISULFID 


265 


280 


BY SIMILARITY. 


FT 


DISULFID 


1254 


1263 


BY SIMILARITY. 


FT 


DISULFID 


282 


291 


BY SIMILARITY. 


FT 


DISULFID 


1270 


1283 


BY SIMILARITY. 


FT 


DISULFID 


298 


311 


BY SIMILARITY, 


FT 


DISULFID 


1275 


1292 


BY SIMILARITY. 


FT 


DISULFID 


305 


320 


BY SIMILARITY, 


FT 


DISULFID 


1294 


1303 


BY SIMILARITY, 




DISULFID 


322 


331 


BY SIMILARITY. 


FT 


DISULFID 


1310 


1321 


BY SIMILARITY, 




DISULFID 


338 


349 


BY SIMILARITY. 


FT 


DISULFID 


1315 


1333 


BY SIMILARITY. 


w 


DISULFID 


343 


358 


BY SIMILARITY. 


FT 


DISULFID 


1335 


1344 


BY SIMILARITY. 


FT 


DISULFID 


360 


369 


BY SIMILARITY. 


FT 


DISULFID 


1351 


1362 


BY SIMILARITY, 


FT 


DISULFID 


375 


386 


BY SIMILARITY. 


FT 


DISULFID 


• 1356 


1371 


BY SIMILARITY, 


FT 


DISULFID 


380 


397 


BY SIMILARITY. 


FT 


DISULFID 


1373 


1382 


BY SIMILARITY. 


FT 


DISULFID 


399 


408 


BY SIMILARITY. 


FT 


DISULFID 


1390 


1401 


BY SIMILARITY. 


FT 


DISULFID 


415 


428 


BY SIMILARITY. 


FT 


DISULFID 


1395 


1412 


BY SIMILARITY. 


FT 


DISULFID 


422 


437 


BY SIMILARITY. 


FT 


DISULFID 


1414 


1423 


BY SIMILARITY, 


FT 


DISULFID 


439 


448 


BY SIMILARITY. 


FT 


CARBOHYD 


462 


462 


POTENTIAL, 


FT 


DISULFID 


455 


466 


BY SIMILARITY. 


FT 


CARBOHYD 


887 


887 


POTENTIAL . 


FT 


DISULFID 


460 


475 


BY SIMILARITY. 












FT 


DISULFID 


477 


486 


BY SIMILARITY, 


Note: remainder of annotations omitted. 


FT 


DISULFID . 


493 


504 


BY SIMILARITY. 












FT 


DISULFID 


498 


513 


BY SIMILARITY, 


Query Match 




13.34; 


Score 108; DB 1; Length 2524; 


FT 


DISULFID 


515 


524 


BY SIMILARITY. 


Best Local Similarity 22.2%; 


Pred. No, 2,95e-06; 


FT 


DISULFID 


531 


542 


BY SIMILARITY, 


Matches 22; Conservative 


30; Mismatches 39; Indels 8; Gaps 


FT 


DISULFID 


536 


551 


BY SIMILARITY, 










FT 


DISULFID 


553 


562 


BY SIMILARITY. 


Db 


588 CLCRPGYTGRLCDNDINECLSKPCLNGGQCTDRENGYICTCPKGTTGVNCETKIDDCASN 647 


FT 


DISULFID 


569 


579 


BY SIMILARITY. 




lll:M::| 


1: : 1 II: 


: : |::M :|: :: 1 : |:: 


FT 


DISULFID 


574 


588 


BY SIMILARITY, 


oy 


12 CLCQPGFSGEHCQQE-NPCLGQV-VR--EVIRRQKGYA-SCATASKVPIMECR-GGCGPQ 65 


FT 


DISULFID 


590 


599 


BY SIMILARITY. 










FT 


DISULFID 


606 


617 


BY SIMILARITY. 


Db 


648 LCDNGKCIDKIDGYECTCEPGYTGKLCNININECDSNPC 686 


FT 


DISULFID 


611 


626 


BY SIMILARITY. 




1: 




::M " 


: : : :: II :l 


FT 


DISULFID 


628 


637 


BY SIMILARITY. 


Qy 


66 CCQPTRSKRRKYVFQCTDGSSFVEEV-ERHL-ECGCLAC 102 


FT 


DISULFID 


644 


654 


BY SIMILARITY, 










FT 


DISULFID 


649 


663 


BY SIMILARITY. 












FT 


DISULFID 


665 


674 


BY SIMILARITY. 


RESULT 13 








FT 


DISULFID 


681 


692 


BY SIMILARITY. 


ID 


VWFJUMAN STANDARD; 


PRT; 2813 AA. 




DISULFID 


686 


701 


BY SIMILARITY, 


AC 


P04275; 








■ 


DISULFID 


703 


712 


BY SIMILARITY. 


DT 


20-MAR-1987 (REL, 04, CREATED) 


w 


DISULFID 


719 


729 


BY SIMILARITY. 


DT 


01-JUL-1993 (REL. 26, LAST SEQUENCE UPDATE) 


FT 


DISULFID 


724 


738 


BY SIMILARITY. 


DT 


15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 


FT 


DISULFID 


740 


749 


BY SIMILARITY. 


DE 


VON WILLEBRAND FACTOR PREC 


URSOR. 


FT 


DISULFID 


756 


767 


BY SIMILARITY. 


GN 


F8VWF OR 


VWF. 






FT 


DISULFID 


761 


776 


BY SIMILARITY, 


OS 


HOMO SAPIENS (HUMAN) , 




FT 


DISULFID 


778 


787 


BY SIMILARITY, 


OC 


EOKARYOM 


; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 


FT 


DISULFID 


794 


805 


BY SIMILARITY, 


OC 


PRIMATES; CATARRH INI; HOMINIDAE; HOMO. 


FT 


DISULFID 


799 


814 


BY SIMILARITY, 


RN 


[1] 








FT 


DISULFID 


816 


825 


BY SIMILARITY, 


RP 


SEQUENCE FROM N 


A. 




FT 


DISULFID 


832 


843 


BY SIMILARITY, 


RX 


MEDLINE; 90062044. 




FT 


DISULFID 


837 


854 


BY SIMILARITY, 


RA 


MANCUSO D.J., TULEY E.A., 


WESTFIELD L.A., WORRALL NX, 


FT 


DISULFID 


856 


865 


BY SIMILARITY. 


RA 


SHELTON- INLOES 


3.B., SORAC 


E J.M., ALEVY Y.G., SADLER J.E.; 


FT 


DISULFID 


872 


883 


BY SIMILARITY, 


RT 


"Structure of the gene for human von Willebrand factor."; 


FT 


DISULFID 


877 


892 


BY SIMILARITY, 


RL 


J. BIOL. CHEM. 264:19514-19527(1989). 


FT 


DISULFID 


894 


903 


BY SIMILARITY. 


RN 


[2] 

SEQUENCE 








FT 


DISULFID 


910 


921 


BY SIMILARITY. 


RP 


FROM N.A. 




FT 


DISULFID 


915 


930 


BY SIMILARITY, 


RX 


MEDLINE; 87016349. 




FT 


DISULFID 


932 


941 


BY SIMILARITY, 


RA 


BONTHRON D., OR 


\ E.C., MITSOCK L.M., MITSOCK L.M., GINSBURG D., 


FT 


DISULFID 


986 


997 


BY SIMILARITY. 


RA 


HANDIN R 


I,, ORKIN S.H.; 




FT 


DISULFID 


991 


1006 


BY SIMILARITY, 


RT 


"Nucleotide seq 


ience of pre-pro-von Willebrand factor cDNA, \ 


FT 


DISULFID 


1008 


1017 


BY SIMILARITY. 


RL 


NUCLEIC ACIDS R 


]S. 14:7125-7128(1986), 


FT 


DISULFID 


1024 


1035 


BY SIMILARITY, 


RN 


[3] 
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SEQUENCE OF 1-1400 FROM N.A. 
MEDLINE; 87004550. 

VERWEIJ C.L., DIERGAARDE P.J,, HART M., PANNEKOEK H.; 
"Full-length von Willebrand factor (vWF) cDNA encodes a highly 
repetitive protein considerably larger than the mature vWF subunit."; 
EMBO J. 5:1839-1847(1986). 
[4] 



4 

RA 
RT 
RT 
RL 



RA 

I 



VERWEIJ C.L., DIERGAARDE P.J., HART M., PANNEKOEK H.; 

EMBO J. 5:3074-3074(1986). 

[5] 

SEQUENCE OF 764-2813. 
MEDLINE; 86269895. 

TITANI K., KUMAR S., TAKIO K., ERICSSON L.H., WADE R.D., ASHIDA K., 
WALSH K.A., CHOPEK M.W., SADLER J.E., FUJIKAWA K,; 

"Amino acid sequence of human von Willebrand factor/; 

BIOCHEMISTRY 25:3171-3184(1986). 
[6] 

SEQUENCE OF 781-1424 FROM N.A. 
MEDLINE; 86269894. 

SHELTON-INLOES B.B., TITANI K,, SADLER J.E.; 

"cDNA sequences for human von Willebrand factor reveal five types of 

repeated domains and five possible protein sequence polymorphisms."; 

BIOCHEMISTRY 25:3164-3171(1986). 

[7] 

SEQUENCE OF 764-873 AND 1289-2813 FROM N.A. 
MEDLINE; 86016708. 

SADLER J.E., SHELTON-INLOES B.B., S0RACE J.M., HARLAN J.M., 
TITANI K., DAVIE E.W.; 

"Cloning and characterization of two cDNAs coding for human von 
Willebrand factor."; 

PROC. NATL. ACAD. SCI. U.S.A. 82:6394-6398(1985). 
[8] 

SEQUENCE OF 2731-2813 FROM N.A. 
MEDLINE; 85269603. 

VERWEIJ C.L., DE VRIES C.J.M., DISTEL B., VAN ZONNEVELD A. -J., 
GEURTS VAN KESSEL A., VAN MOURIK J. A., PANNEKOEK H,; 
"Construction of cDNA coding for human von willebrand factor using 
antibody probes for colony-screening and mapping of the chromosomal 



RT gene." 



NUCLEIC ACIDS RES, 13:4699-4717(1985). 
19] 

SEQUENCE OF 1-177 FROM N.A. 
MEDLINE; 88111704. 
BONTHRON D,, ORKIN S.H.; 

"The human von Willebrand factor gene, Structure of the 5' region."; 

EUR. J. BIOCHEM. 171:51-57(1988), 

[10] 

SEQUENCE OF 2731-2813 FROM N.A. 
MEDLINE; 87260814. 

COLLINS C.J., UNDERDAHL J. P., LEVENE R.B., RAVERA CP., 
MORIN M.J., DOMBALAGIAN M.J., RICCA G., LIVINGSTON D.H., 
LYNCH D.C.; 

"Molecular cloning of the human gene for von willebrand factor and 
identification of the transcription initiation site."; 
PROC. NATL. ACAD. SCI. U.S.A. 84:4393-4397(1987). 
[11] 

DISULFIDE BONDS. 
MEDLINE; 88163465. 

MARTI T., ROSSELET S.J., TITANI K., WALSH K.A.; 

"Identification of disulfide-bridged substructures within human von 

Willebrand factor,"; 

BIOCHEMISTRY 26:8099-8109(1987). 

[12] 

STRUCTURE OF CARBOHYDRATES, 
MEDLINE; 86274702, 

SAMOR B., MICHALSKI J.C., DEBRAY H. , MAZURIER C, GOUDEMAND M., 

VAN HALBEEK H., VLIEGENTHART J.F.G., MONTREUIL J.; 

"Primary structure of a new tetraantennary glycan of the N- 

acetyllactosaminic type isolated from human factor VHI/von 

Willebrand factor,"; 

EUR. J, BIOCHEM. 158:295-298(1986), 

[13] 



X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS) OF 1685-1873, 
MEDLINE; 97472999. 

HUIZINGA E.G., MART UN VAN DER PLAS R., KROON J., SIXMA J.J., GROS P. 
"Crystal structure of the A3 domain of human von Willebrand factor: 
implications for collagen binding."; 
STRUCTURE 5:1147-1156(1997). 
[14] 

X-RAY CRYSTALLOGRAPHY (2.2 ANGSTROMS) OF 1686-1872. 
MEDLINE; 97460108. 

BIENKOWSKA J., CRUZ M,, ATIEMO A., HANDIN R., LIDDINGTON R.; 
"The von willebrand factor A3 domain does not contain a metal ion- 
dependent adhesion site motif."; 
J. BIOL. CHEM. 272:25162-25167(1997). 
[15] 

VARIANTS TRP-1597 AND ASP-1607, 
MEDLINE; 89264495, 

GINSBURG D,, KONKLE B.A., GILL J,C, MONTGOMERY R.R., 

BOCKENSTEDT P.L., JOHNSON T,A, , YANG A.Y.; 

"Molecular basis of human von willebrand disease; analysis of 

platelet von Willebrand factor mRNA, "; 

PROC. NATL. ACAD. SCI. U.S.A. 86:3723-3727(1989). 

[16] 

VARIANT THR-1628, 
MEDLINE; 91196734. 

IANNUZZI M.C., HIDAKA N., BOEHNKE M. , BRUCK M.E., HANNA W.T., 
COLLINS F.S,, GINSBURG D.; 

"Analysis of the relationship of von Willebrand disease (vWD) and 
hereditary hemorrhagic telangiectasia and identification of a 
potential type IIA vWD mutation (IIe865 to Thr),"; 
AM. J. HUM. GENET. 48:757-763(1991). 
[17] 

VARIANTS NORMANDY -2 AND NORMANDY- 3, 
MEDLINE; 92001464. 

GAUCHER C, MERCIER B., JORIEUX S., OUFKIR D,, MAZURIER C; 
"Identification of two point mutations in the von Willebrand factor 
gene of three families with the 'Normandy' variant of von Willebrand 



BR, J. HAEMATOL, 78:506-514(1991). 
[18] 

VARIANT CYS-1308. 
MEDLINE; 92104315. 

DONNER M., ANDERSSON A.-M,, KRISTOFFERSSON A.-C, NILSSON I.M., 
DAHLBACK B., HOLMBERG L.; 

"An Arg545-->Cys545 substitution mutation of the von Willebrand 
factor in type IIB von Willebrand's disease."; 
EUR. J. HAEMATOL. 47:342-345(1991). 
[19] 

VARIANTS TRP-1306; CYS-1308 AND PRO-1613. 
MEDLINE; 91185601. 

RANDI A,M, , RABINOWITZ I., MANCUSO D.J., MANNUCCI P.M., SADLER J.E.; 

"Molecular basis of von Willebrand disease type IIB. Candidate 
mutations cluster in one disulfide loop between proposed platelet 
glycoprotein lb binding sequences."; 
J. CLIN. INVEST. 87:1220-1226(1991). 
[20] 

VARIANTS TRP-1306; CYS-1308; MET-1316; GLN-1341 AND HIS-1399. 
MEDLINE; 91185602. 

COONEY K.A., NICHOLS W.C, BRUCK M.E., BAHOU W.F., SHAPIRO A.D., 
BOWIE E.J.W., GRALNICK H.R., GINSBURG D.; 

"The molecular defect in type IIB von willebrand disease. 
Identification of four potential missense mutations within the 
putative Gplb binding domain."; 
J, CLIN, INVEST. 87:1227-1233(1991). 
[21] 

VARIANT CYS-1313, 
MEDLINE; 91187908. 

WARE J,, DENT J, A., AZUMA H. , SUGIMOTO M., KYRLE P.A., YOSHIOKA A., 
RUGGERI Z.M.; 

"Identification of a point mutation in type IIB von Willebrand 
disease illustrating the regulation of von willebrand factor affinity 
for the platelet membrane glycoprotein lb- IX receptor."; 
PROC, NATL. ACAD, SCI, U.S.A. 88:2946-2950(1991). 
[22] 
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RP VARIANT NORMANDY- 1. 

RX MEDLINE; 91296824, 

RA TtlLEY E.A., GAUCHER C, JORIEUX S., WQRRALL N.K., SADLER J.E., 

RA MAZDRIERC; 

RT "Expression of von Willebrand factor 'Normandy': an autosomal 

RT mutation that mimics hemophilia A."; 

RL PROC. NAIL. ACAD. SCI, D.S.A. 88:6377-6381(1991), 

RN [23] 

RP VARIANT MET- 1316, 

RX MEDLINE; 92109240. 

RA MURRAY E.W., GILES A.R., LILLICRAP D.; 

RT "Germ- line mosaicism for a valine-to-methionine substitution at 

RT residue 553 in the glycoprotein lb-binding domain of von Willebrand 

RT factor, causing type IIB von Willebrand disease/; 

RL AM, J. HOM. GENET. 50:199-207(1992). 

RN [24] 

RP VARIANTS TRP-1306; MEM316; THR-1628 AND SER-1648. 

RX MEDLINE; 93042596. 

RA PIETU G., RIBBA A.S., DE PAILLETTE L. , CHEREL G., LAVERGNE J.M., 

RA BAHNAK B.R., MEYER D.; 

•"Molecular study of von Willebrand disease: identification of 
potential mutations in patients with type IIA and type IIB,"; 
BLOOD COAGUL, FIBRINOLYSIS 3:415-421(1992), 

RN [25] 

RP VARIANTS TRP-1306; CYS-1308; LEU-1314 AND LEO-1318. 

RX MEDLINE; 93041230, 

RA DONNER M. , KRISTOFFERSSON A.C., LENK H., SCHEIBEL E. , DAHLBACK B. ( 

RA NILSSON I.M., HOLMBERG L.; 

RT "Type IIB von Willebrand's disease: gene mutations and clinical 

Note: remainder of annotations omitted. 

Query Match 12.9%; Score 105; DB 1; Length 2813; 

Best Local Similarity 41.2%; Pred. No, 1.16e-05; 

Matches 14; Conservative 5; Mismatches 15; Indels 0; Gaps 

Db 2773 CCSPTRTEPMQVALHCTNGSWYHEVLNAMECKC 2806 

II III: ::||:H II Ml I 
Qy 66 CCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



ID 


PGBMJIOUSE STANDARD; PRT; 3707 AA. 


KW 


SIGNAL; BASEMENT MEMBRANE; PROTEOGLYCAN; REPEAT; GLYCOPROTEIN; 


AC 


QQ5793; 


KW 


HEPARAN SULFATE; 


LAMININ E 


jF-LIKE DOMAIN; IMMUNOGLOBULIN FOLD; 


DT 


01-NOV-1995 (REL. 32, CREATED) 


KW 


EXTRACELLULAR MATRIX; EGF-LIKE DOMAIN. 


DT 


01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 


FT 


SIGNAL 


1 


21 


POTENTIAL. 


DT 
DE 


15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

BASEMENT MEMBRANE-SPECIFIC HEPARAN SULFATE PROTEOGLYCAN CORE 


FT 
FT 


CHAIN 


22 


3707 


BASEMENT MEMBRANE -SPECIFIC HEPARAN 
SULFATE PROTEOGLYCAN CORE PROTEIN. 




PROTEIN PRECURSOR (HSPG) (PERLECAN) (PLC) . 


FT 


DOMAIN 


22 


193 


DOMAIN I (UNIQUE, CONTAINS 3 HS SIDE 


i 


HSPG2. 


FT 








CHAINS) . 




MUS MUSCULUS (MOUSE). 


FT 


DOMAIN 


194 


403 


DOMAIN II (4 LDLRA REPEATS), 




EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 


FT 


DOMAIN 


404 


504 


DOMAIN IIA (1 IGG - REPEAT ) , 


OC 
RN 


RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 
[1] 


FT 
FT 


DOMAIN 


507 


1676 


DOMAIN III (SIMILAR TO SHORT ARM OF 
LAMININ A CHAIN) , 


RP 


SEQUENCE FROM N.A. 


FT 


DOMAIN 


1677 


2980 


DOMAIN IV (SIMILAR TO NEURAL CELL 


RC 


TISSUE=MELANOMA; 


FT 








ADHESION MOLECULE; 14 IGG REPEATS). 


RX 


MEDLINE; 92078153. 


FT 


DOMAIN 


2981 


3707 


DOMAIN V (C -TERMINAL G- DOMAIN OF LAMININ 


RA 


NOONAN D,M,, FULLE A., VALENTE P,, CAI S., HORIGAN E,, SASAKI M., 


FT 








ALPHA CHAINS AND EGF) . 


RA 


YAMADA Y. , HASSELL J.R.; 


FT' 


DOMAIN 


194 


234 


LDL-RECEPTOR CLASS A 1. 


RT 


"The complete sequence of perlecan, a basement membrane heparan 


FT 


DOMAIN 


281 


319 


LDL-RECEPTOR CLASS A 2. 


RT 


sulfate proteoglycan, reveals extensive similarity with laminin A 


FT 


DOMAIN 


320 


359 


LDL-RECEPTOR CLASS A 3. 


RT 


chain, low density lipoprotein-receptor, and the neural cell adhesion 
molecule."; 


FT 


DOMAIN 


360 


403 


LDL-RECEPTOR CLASS A 4. 


RT 


FT 


DOMAIN 


404 


504 


IG-LIKE C2-TYPE DOMAIN 1. 


RL 


J. BIOL. CHEM. 266:22939-22947(1991), 


FT 


DOMAIN 


521 


530 


LAMININ EGF-LIKE 1 (N-TERMINAL) . 


RN 


[2] 


FT 


DOMAIN 


531 


730 


LAMININ ■ DOMAIN IV 1 (DOMAIN III A). 


RP 


SEQUENCE OF 940-1601 AND 1870-2600 FROM N.A., AND PARTIAL SEQUENCE. 


FT 


DOMAIN 


731 


763 


LAMININ EGF-LIKE 1 (C'TERMINAL) . 


RX 


MEDLINE; 89034110. 


FT 


DOMAIN 


764 


813 


LAMININ EGF-LIKE 2. 


RA 


NOONAN D.M., HORIGAN E,A,, LEDBETTER S.R., VOGELI G., SASAKI M., 


FT 


DOMAIN 


814 


871 


LAMININ EGF-LIKE 3. 


RA 


YAMADA Y,, HASSELL J.R.; 


FT 


DOMAIN 


879 


923 


LAMININ EGF-LIKE 4 (INCOMPLETE). 


RT 


"Identification of cDNA clones encoding different domains of the 


FT 


DOMAIN 


924 


933 


LAMININ EGF-LIKE 5 (N-TERMINAL). 


RT 


basement membrane heparan sulfate proteoglycan."; 


FT 


DOMAIN 


934 


1125 


LAMININ DOMAIN IV 2 (DOMAIN III B) . 


RL 


J. BIOL. CHEM. 263:16379-16387(1988). 


FT 


DOMAIN 


1126 


1158 


LAMININ EGF-LIKE 5 (C -TERMINAL) . 


CC 


H- FUNCTION: THIS PROTEIN IS AN INTEGRAL COMPONENT OF BASEMENT 


FT 


DOMAIN 


1159 


1208 


LAMININ EGF-LIKE 6, 



CC MEMBRANES. IT IS RESPONSIBLE FOR THE FIXED NEGATIVE ELECTROSTATIC 
CC CHARGE AND IS INVOLVED IN THE CHARGE-SELECTIVE ULTRAFILTRATION 
CC PROPERTIES. IT INTERACTS WITH OTHER BASEMENT MEMBRANE COMPONENTS 
CC SUCH AS LAMININ AND COLLAGEN TYPE IV AND SERVES AS AN ATTACHMENT 
CC SUBSTRATE FOR CELLS. 

CC -!• SUBUNIT: PURIFIED PERLECAN HAS A STRONG TENDENCY TO AGGREGATE IN 

CC DIMERS OR STELLATE STRUCTURES, 

CC -I- SUBCELLULAR LOCATION: EXTRACELLULAR. 

CC -!- TISSUE SPECIFICITY: FOUND IN THE BASEMENT MEMBRANES, 

CC -I- PTM: CONTAINS THREE HEPARAN SULFATE CHAINS AS WELL AS N-LINKED 

CC AND O-LINKED OLIGOSACCHARIDES. 

CC -I- SIMILARITY: CONTAINS 4 LDL-RECEPTOR CLASS A DOMAINS. 

CC -!■ SIMILARITY: CONTAINS 10.5 LAMININ EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LAMININ DOMAINS IV. 

CC -!• SIMILARITY: BELONGS TO THE IMMUNOGLOBULIN SUPERFAMILY. CONTAINS 

CC 15 C2-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 2 LAMININ G-LIKE DOMAINS, 

CC -!• SIMILARITY: CONTAINS 2 EGF-LIKE DOMAINS. 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch). 

CC 

DR EMBL; M77174; G200296; -. 

DR EMBL; J04054; G200253; -. 

DR EMBL; J04055; G200301; -, 

DR MGD; MGI: 96257; HSPG2. 

DR PROSITE; PS00022; EGF_1; 8. 

DR PROSITE; PS01186; EGF J; 5, 

DR PROSITE; PS01209; LDLRAJ; 4. 

DR PROSITE; PS01248; LAMININJYPEJGF; 11, 

DR PROSITE; PS50068; LDLRAJ; 4. 

DR PFAM; PF00047; ig; 14, 

DR PFAM; PF00052; lamlninj; 3, 

DR PFAM; PF00053; lamininJGF; 8. 

DR PFAM; PF00054; laminin_G; 3, 

DR PFAM; PF00057; ldl_recept_a; 4. 

DR HSSP; P01130; 1AJJ. 
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FT 


DOMAIN 


1209 


1265 


LAMININ EGF-LIKE 7. 


FT 


DOMAIN 


1275 


1324 


LAMININ EGF-LIKE 8. 


FT 


DOMAIN 


1325 


1334 


LAMININ EGF-LIKE 9 (N-TERMINAL) . 


FT 


DOMAIN 


1335 


1529 


LAMININ DOMAIN IV 3 (DOMAIN III C). 


FT 


DOMAIN 


1530 


1562 


LAMININ EGF-LIKE 9 (C-TERMINAL) . 


FT 


DOMAIN 


1563 


1612 


LAMININ EGF-LIKE 10. 


FT 


DOMAIN 


1613 


1670 


LAMININ EGF-LIKE 11. 


FT 


DOMAIN 


1677 


1771 


IG-LIKE C2-TYPE DOMAIN 2. 


FT 


DOMAIN 


1772 


1865 


IG-LIKE C2-TYPE DOMAIN 3. 


FT 


DOMAIN 


1866 


1954 


IG-LIKE C2-TYPE DOMAIN 4. 


FT 


DOMAIN 


1955 


2049 


IG-LIKE C2-TYPE DOMAIN 5, 


FT 


DOMAIN 


2050 


2148 


IG-LIKE C2-TYPE DOMAIN 6, 


FT 


DOMAIN 


2149 


2244 


IG-LIKE C2-TYPE DOMAIN 7. 


FT 


DOMAIN 


2245 


2343 


IG-LIKE C2-TYPE DOMAIN 8. 


FT 


DOMAIN 


2344 


2436 


IG-LIKE C2-TYPE DOMAIN 9. 


FT 


DOMAIN 


2437 


2532 


IG-LIKE C2-TYPE DOMAIN 10. 


ET 


DOMAIN 


2533 


2619 


IG-LIKE C2-TYPE DOMAIN 11. 


i 


DOMAIN 


2620 


2720 


IG-LIKE C2-TYPE DOMAIN 12. 


1 


DOMAIN 


2721 


2809 


IG-LIKE C2-TYPE DOMAIN 13. 




DOMAIN 


2810 


2895 


IG-LIKE C2-TYPE DOMAIN 14. 


FT 


DOMAIN 


2896 


2980 


IG-LIKE C2-TYPE DOMAIN 15. 


FT 


DOMAIN 


2981 


3130 


LAMININ G-LIKE 1 (GLOBULAR DOMAIN V A) 


FT 


DOMAIN 


3049 


3241 


EGF-LIKE 1. 


FT 


DOMAIN 


3304 


3495 


EGF-LIKE 2. 


FT 


DOMAIN 


3558 


3705 


LAMININ G-LIKE 2 (GLOBULAR DOMAIN V B) 


FT 


SITE 


65 


67 


HEPARAN SULFATE (POTENTIAL), 


FT 


SITE 


71 


73 


HEPARAN SULFATE (POTENTIAL), 


FT 


SITE 


76 


78 


HEPARAN SULFATE (POTENTIAL) . 


FT 


SITE 


3615 


3617 


MEDIATES 'MOTOR NEURON ATTACHMENT 


FT 








(POTENTIAL) . 


FT 


DISOLFID 


199 


212 


BY SIMILARITY. 


FT 


DISULFID 


206 


225 


BY SIMILARITY. 


FT 


DISOLFID 


219 


234 


BY SIMILARITY. 


FT 


DISULFID 


285 


297 


BY SIMILARITY. 


FT 


DISOLFID 


292 


310 


BY SIMILARITY, 


FT 


DISOLFID 


304- 


319 


BY SIMILARITY. 


FT 


DISOLFID 


325 


337 


BY SIMILARITY. 


FT 


DISOLFID 


332 


350 


BY SIMILARITY. 


FT 


DISOLFID 


344 


359 


BY SIMILARITY. 


FT 


DISOLFID 


368 


381 


BY SIMILARITY. 


FT 


DISOLFID 


375 


394 


BY SIMILARITY. 


FT 


DISOLFID 


388 


403 


BY SIMILARITY. 


FT 


DISOLFID 


428 


479 


BY SIMILARITY. 


FT 


DISOLFID 


764 


773 


BY SIMILARITY, 


FT 


DISOLFID 


766 


780 


BY SIMILARITY . 


m 


DISOLFID 


783 


792 


BY SIMILARITY, 


■ 


DISOLFID 


795 


811 


BY SIMILARITY, 


w 


DISOLFID 


814 


829 


BY SIMILARITY . 


FT 


DISOLFID 


816 


839 


BY SIMILARITY. 


FT 


DISOLFID 


842 


851 


BY SIMILARITY. 


FT 


DISOLFID 


854 


869 


BY SIMILARITY, 


FT 


DISOLFID 


1159 


1168 


BY SIMILARITY, 


FT 


DISOLFID 


1161 


1175 


BY SIMILARITY, 


FT 


DISOLFID 


1178 


1187 


BY SIMILARITY. 


FT . 


DISOLFID 


1190 


1206 


BY SIMILARITY. 


FT 


DISOLFID 


1209 


1224 


BY SIMILARITY. 


FT 


DISOLFID 


1211 


1234 


BY SIMILARITY. 


FT 


DISOLFID 


1237 


1246 


BY SIMILARITY. 


FT 


' DISOLFID 


1249 


1263 


BY SIMILARITY. 


FT 


DISOLFID 


1275 


1287 


BY SIMILARITY. 


FT 


DISOLFID 


1277 


1293 


BY SIMILARITY. ■ 


FT 


DISOLFID 


1295 


1304 


BY SIMILARITY. 


FT 


DISOLFID 


1307 


1322 


BY SIMILARITY, 


FT 


DISOLFID 


1563 


1572 


BY SIMILARITY. 


FT 


DISOLFID 


1565 


1579 


BY SIMILARITY. 


FT 


DISOLFID 


1582 


1591 


BY SIMILARITY. 


FT 


DISOLFID 


1594 


1610 


BY SIMILARITY. 


FT 


DISOLFID 


1613 


1628 


BY SIMILARITY. 


FT 


DISOLFID 


1615 


1638 


BY SIMILARITY. 


FT 


DISOLFID 


1641 


1650 


BY SIMILARITY. 


FT 


DISOLFID 


1653 


1668 


BY SIMILARITY. 


FT 


DISOLFID 


1792 


1839 


BY SIMILARITY. 


FT 


DISULFID 


1886 


1932 


BY SIMILARITY. 



FT 


DISULFID 


1976 


2021 


BY SIMILARITY. 


FT 


DISULFID 


2073 


2118 


BY SIMILARITY. 


FT 


DISULFID 


2170 


2215 


BY SIMILARITY. 


FT 


DISULFID 


2268 


2313 


BY SIMILARITY, 


FT 


DISULFID 


2365 


2413 


BY SIMILARITY. 


FT 


DISULFID 


2456 


2506 


BY SIMILARITY, 


FT 


DISULFID 


2554 


2599 


BY SIMILARITY. 


FT 


DISULFID 


2641 


2686 


BY SIMILARITY, 


FT 


DISULFID 


2831 


2876 


BY SIMILARITY. 


FT 


DISULFID 


2917 


2962 


BY SIMILARITY. 


FT 


CARBOHYD 


65 


65 


GLYCOSAMINOGLYCAN (POTENTIAL) 


FT 


CARBOHYD 


71 


71 


GLYCOSAMINOGLYCAN (POTENTIAL) 


FT 


CARBOHYD 


76 


76 


GLYCOSAMINOGLYCAN (POTENTIAL) 


FT 


CARBOHYD 


89 


89 


POTENTIAL. 


FT 


CARBOHYD 


358 


358 


POTENTIAL, 


FT 


CARBOHYD 


554 


554 


POTENTIAL. 


FT 


CARBOHYD 


1256 


1256 


POTENTIAL, 


FT 


CARBOHYD 


1891 


1891 


POTENTIAL, 


FT 


CARBOHYD 


2336 


2336 


POTENTIAL. 


FT 


CARBOHYD 


2394 


2394 


POTENTIAL. 


FT 


CARBOHYD 


2427 


2427 


POTENTIAL. 


FT 


CARBOHYD 


2600 


2600 


POTENTIAL. 


FT 


CARBOHYD 


3098 


3098 


POTENTIAL. 



Note: remainder of annotations omitted. 



Query Match 12.9%; Score 105; DB 1; Length 3707; 

Best Local Similarity 63.2%; Pred. No, 1.16e-05; 

Matches 12; Conservative 3; Mismatches 3; Indels 1; Gaps 1; 

Db 3446 CLCQDGFKGDLCEHEENPC 3464 

III! II I: l::l III 
Qy 12 CLCQPGFSGEHCQQE-NPC 29 



RESDLT 15 

ID LRPl.CHICK STANDARD; PRT; 4543 AA. 

AC P98157; 

DT 01-OCT-1996 (REL. 34, CREATED) 

DT 01-OCT-1996 (REL, 34, LAST SEQUENCE UPDATE) 

DT 01-OCT-1996 (REL, 34, LAST ANNOTATION OPDATE) 

DE LOW-DENSITY LIPOPROTEIN RECEPTOR-RELATED PROTEIN 1 PRECURSOR (LRP) 

DE (ALPHA-2-MACROGLOBULIN RECEPTOR) (A2MR), 

OS GALLUS GALLUS (CHICKEN) . 

OC EOKARYOTA; METAZOA; CHORDATA; VERTEBRA! A; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLI FORMES; PHASIANIDAE; PHASIANINAE; GALLUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-WHITE LEGHORN; TISSOE-LIVER, AND OVARY; 

RX MEDLINE; 94103212. 

RA NIMPF J., STIFANI S., BILOOS P.T., SCHNEIDER W.J.; 

RT "The somatic cell-specific low density lipoprotein receptor-related 

RT protein of the chicken. Close kinship to mammalian low density 

RT lipoprotein receptor gene family members."; 

RL J. BIOL. CHEM, 269:212-219(1994). 

CC -!- FUNCTION: INVOLVED IN THE PLASMA CLEARANCE OF CHYLOMICRON REMNANTS 

CC AND ACTIVATED ALPHA 2-MACROGLOBULIN, AS WELL AS THE LOCAL 

CC METABOLISM OF COMPLEXES BETWEEN PLASMINOGEN ACTIVATORS AND THEIR 

CC ENDOGENOUS INHIBITORS. BINDS VITELLOGENIN, CALCIUM AND ALPHA 2- 

CC MACROGLOBOLIN. 

CC -!- SUBCELLOLAR LOCATION: TYPE I MEMBRANE PROTEIN, 

CC -!- TISSUE SPECIFICITY: SOMATIC. 

CC -I* PTM: CLEAVED INTO A 85 KD MEMBRANE-SPANNING SUBUNIT (LRP-85) AND 

CC A 515 KD LARGE EXTRACELLULAR DOMAIN (LRP-515) THAT REMAINS NON- 

CC COVALENTLY ASSOCIATED. 

CC -I- ALTERNATIVE PRODOCTS: IN CLONE JN18, AN ASP IS REPLACED BY 

CC SER-GLO-ARG-GLN-ASP DOE TO ALTERNATIVE SPLICING OF EXON3. 

CC -I- SIMILARITY: CONTAINS 22 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 31 LDL- RECEPTOR CLASS A DOMAINS. 

cc 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 
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CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www,isb-sib,ch/announce/ 

CC or send an email to licensedisb-sib.ch), 

CC 

DR EMBL; X74904; G438007; -. 

DR PROSITE; PS00010; ASX HYDROXYL; 3. 

DR PROSITE; PS00022; EGF.l; 5, 

DR PROSITE; PS01186; EGF.2; 7. 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PROSITE; PS01209; LDLRA 1; 27. 

DR PROSITE; PS50068; LDLRAJ; 31. 

DR PFAM; PF00008; EGF; 14. 

DR PFAM; PF00057; ldl_recept_a; 31, 

DR PFAM; PF00058; ldl_recept_b; 33. 

DR HSSP; P01130; 1AJJ. 

KW RECEPTOR; TRANSMEMBRANE; REPEAT; ENDOCYTOSIS; GLYCOPROTEIN; 

KW SIGNAL; CALCIUM-BINDING; EGF-LIKE DOMAIN; COATED PITS; 

KW ALTERNATIVE SPLICING. 



FT 


SIGNAL 


1 


21 


POTENTIAL. 






CHAIN 


22 


4543 


LOW-DENSITY LIPOPROTEIN RECEPTOR-RELATED 










PROTEIN 1. 






DOMAIN 


22 


4419 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


4420 


4443 


POTENTIAL. 




FT 


DOMAIN 


4444 


4543 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


27 


68 


LDL- RECEPTOR 


CLASS A 1. 


FT 


DOMAIN 


72 


112 


LDL- RECEPTOR CLASS A 2. 


FT 


DOMAIN 


113 


151 


EGF-LIKE 1. 




FT 


DOMAIN 


152 


191 


EGF-LIKE 2, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


476 


522 


EGF-LIKE 3. 




FT 


DOMAIN 


801 


841 


EGF-LIKE 4. 




FT 


DOMAIN 


850 


890 


LDL -RECEPTOR 


CLASS A 3. 


FT 


DOMAIN 


891 


931 


LDL -RECEPTOR 


CLASS A 4. 


FT 


DOMAIN 


932 


971 


LDL -RECEPTOR 


CLASS A 5. 


FT 


DOMAIN 


972 


1011 


LDL-RECEPTOR 


CLASS A 6. 


FT 


DOMAIN 


1011 


1051 


LDL -RECEPTOR 


CLASS A 7. 


FT 


DOMAIN 


1058 


1097 


LDL-RECEPTOR 


CLASS A 8. 


FT 


DOMAIN 


1100 


1140 


LDL-RECEPTOR 


CLASS A 9. 


FT 


DOMAIN 


1141 


1180 


LDL-RECEPTOR 


CLASS A 10. 


FT 


DOMAIN 


1181 


1220 


EGF-LIKE 5. 




FT 


DOMAIN 


1221 


1260 


EGF-LIKE 6. 




FT 


DOMAIN 


1534 


1577 


EGF-LIKE 7. 




FT 


DOMAIN 


1842 


1883 


EGF-LIKE 8. 




FT 


DOMAIN 


2151 


2191 


EGF-LIKE 9. 




FT 


DOMAIN 


2472 


2512 


■ EGF-LIKE 10. 




FT 


DOMAIN 


2516 


2557 


LDL-RECEPTOR 


CLASS A 11, 


FT 


DOMAIN 


2558 


2596 


LDL-RECEPTOR CLASS A 12. 


FT 


DOMAIN 


2597 


2635 


LDL-RECEPTOR 


CLASS A 13. 




DOMAIN 


2636 


2684 


LDL-RECEPTOR CLASS A 14, 




DOMAIN 


2688 


2730 


LDL-RECEPTOR 


CLASS A 15, 




DOMAIN 


2730 


2769 


LDL-RECEPTOR CLASS A 16. 


W 


DOMAIN 


2770 


2812 


LDL-RECEPTOR CLASS A 17. 


FT 


DOMAIN 


2814 


2853 


LDL-RECEPTOR CLASS A 18. 


FT 


DOMAIN 


2854 


2897 


LDL-RECEPTOR CLASS A 19. 


FT 


DOMAIN 


2900 


2938 


LDL-RECEPTOR CLASS A 20. 


FT 


DOMAIN 


2939 


2978 


EGF-LIKE 11. 




FT 


DOMAIN 


2979 


3019 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


3287 


3328 


EGF-LIKE 13. 




FT 


DOMAIN 


3329 


3368 


LDL-RECEPTOR 


CLASS A 21. 


FT 


DOMAIN 


3369 


3407 


LDL-RECEPTOR 


CLASS A 22. 


FT 


DOMAIN 


3408 


3447 


LDL-RECEPTOR 


CLASS A 23. 


FT 


DOMAIN 


3448 


3488 


LDL-RECEPTOR 


CLASS A 24. 


FT 


DOMAIN 


3489 


3530 


LDL-RECEPTOR 


CLASS A 25, 


FT 


DOMAIN 


3531 


3569 


LDL-RECEPTOR 


CLASS A 26. 


FT 


DOMAIN 


3570 


3608 


LDL-RECEPTOR 


CLASS A 27. 


FT 


DOMAIN 


3608 


3646 


LDL-RECEPTOR 


CLASS A 28. 


FT 


DOMAIN 


3649 


3689 


LDL-RECEPTOR 


CLASS A 29. 


FT 


DOMAIN 


3690 


3730 


LDL-RECEPTOR 


CLASS A 30. 


FT 


DOMAIN ■ 


3736 


3776 


LDL-RECEPTOR 


CLASS A 31. 


FT 


DOMAIN 


3779 


3821 


EGF-LIKE 14, 




FT 


DOMAIN 


3822 


3859 


EGF-LIKE 15. 




FT 


DOMAIN 


4146 


4182 


EGF-LIKE 16. 




FT 


DOMAIN 


4195 


4231 


EGF-LIKE 17, 





FT 


UvrlnlLi 


4231 


4267 


EAjr birtu io. 




UUMA1N 






FPF-T TVP 1Q 
EAjr L1AL 15 , 


FT 




4303 


4339 


FfF-TTVF Ifl 


FT 




4339 


4374 


FGF-rJKF 01 
I/ljf LIMj CI. 






4372 


4409 


FCF-TTIfF n 


FT 


SITE 


3939 


3942 


RECOGNITION SITE FOR PROTEOLYTICAL 


FT 












SITE 


4472 


4472 


rBTTTfAT, Fnn FHnnfvTOCTQ ^nv ctmtt,abtty\ 


FT 


SITE 


4506 


4506 


CRTTTfAT, FAR FNTWYTnoje /RY <!TMTT,M?TTY^ 
^niii^nij ruft unuwiiuoio ^oi oiniLiArv.111 j . 


FT 


UXOVvC XV 


29 


42 


BY CTMITIBTTV 
DI OlPUbrkrili I . 


FT 


DISULFID 


36 


55 


oy CTMTTABTTV 


FT 


UlOUilf LIJ 


49 


66 


Di oirllLnrvll 1 * 


FT 


UlOULl IV 


74 


87 


CI OlNlUUUlI, 




rvr<:nT,PTn 

UIDULI XV 


81 


100 


oy CTMTTiDTTY 
ul OXrllLfinll I . 


FT 


UloULr IV 


94 


110 


DV CTUTT1DTTV 
DI OlfllLiMllI. 


FT 


nTC.nr.FTn 

U10UL£ IV 




126 


nV CTMTTSOTTV 
SI 01nliini\ll I , 


FT 


nT<5fTT,FTn 
uioujjr iv 


122 


135 


oL OlMlLnftll I , 




UloULr iu 


137 


150 


OV OTUTT &OTTV 
ol olMlLAKllI , 


ft 
J 


UlbULi ID 






OV CTUTTAOTTV 

bl olMlLAKllI. 


n 


nTcriTFTn 

UloULf 1U 






OV CTVTT ROTTV 
Dl OiniLAKll I , 


J 


HTOriT PTn 

DlbULrlL) 


177 


190 


BY SIMILARITY. 




UlbULr iv 


480 


495 


CI blMlLAKIil. 


™ 


riTCm PTn 
UloULr J.U 




cS? 


BY SIMILARITY. 


FT 


nTCnr PTn 
UlbULriU 


508 


521 


OV CTUTT RDTTV 

DI olMILAKITl. 


FT 


DISULFID 


805 


816 


BY SIMILARITY. 




UlbULriU 


812 




DV OTUTf TIDTTV 
DI OlMlLAKllI. 




DISULFID 


827 


840 


BY SIMILARITY. 


™ 
J 


DISULFID 


852 


864 


BY SIMILARITY. 


FT 


nTcrtiPTn 
UlbULr iv 




877 


OV OTUTTftDTTV 

Bl olMlLAKllI. 


FT 


DISULFID 


871 


888 


BY SIMILARITY. 


nl 


nTCHT FTH 

UlbULr iv 


893 


905 


BY SIMILARITY, 


FT 


DISULFID 


900 


918 


BY SIMILARITY, 


FT 


DISULFID 


912 


929 


BY SIMILARITY, 




nTCTTT PTn 

UlbULr iv 






BY SIMILARITY. 


FT 


UlbULr iv 


QA1 
D41 


(KQ 


OV CTMTf AOTTV 

di olMlLAKllI. 


FT 


r\TOftT PTn 

UlbULr ID 


953 


969 


BY SIMILARITY, 




IMCrTT PTn 

UloULr 1U 


oil 




OV CTUTT RDT1*V 

Bl olMILAKITl , 


ft 


nTOttT PTn 

UlbULMU 




innn 


OV CTUTT ADT11V 

Bl blMlLAKIil. 


pi 


UloULr 1U 


QQA 


inno 


OV CTUTT AOT11V 

Di blMlLAKIil. 


vi 


UloULf ID 


inn 


1025 


OV CTUTT RDTTV 

di blMlLAKIil. 


FT 


DISULFID 


1020 


1038 


BY SIMILARITY. 




HTOrTT PTn 

UlbULFID 






BY SIMILARITY , 


FT 


nTGTTT PTn 

UlsULr ID 


1060 


1073 


OV CTUTT HOTfllV 

Bl blMlLAKIil. 


FT 


UloULr lu 


1067 


1086 


OV CTMTT SDTTV 
DI OlnlLAKllI, 


FT 


UloULr lu 






QV CTUTT aOTT^V 

DI blMlLAKIil. 


FT 

J 


nTcriTFTn 
UloULr lu 


1102 


1116 


OV CTMTTSOTTV 
Dl blMlLftKllI. 




nTcnT fth 
UloULr 1U 






OV CTUTT ADTIflV 

di blMlLAKIil. 


FT 


nTcnT FTn 
UlbULriU 




in a 


OV CTUTT AOTTtV 

DI blMlLAKIil. 


™ 


nTCnT FTn 
UloULr 1U 




1157 


BY SIMILARITY. 


FT 


nTcnT FTA 
UloULr lu 


inn 


inn 


OV CTUTT RDTT'V 
Dl blHlLMllI. 


FT 


nTcrn PTn 
UloULr 1U 


1UA 


inn 
in? 


DV CTUTT ADTTV 

di blMlLAKIil, 


FT 


nTCriT PTn 
UloULr ID 


11Q3 


nn! 


DV CTUTT RDTTV 

Di blMlLAKIil. 


FT 


nTCTTT FTn 
UloULr lu 


lion 




DV CTUTT KOTTV 

Di blMlLAKIil, 


FT 

r.1 


nTCm PTn 
UloULr ID 


iins 


111Q 


BY SIMILARITY. 




nTcrn PTn 
UloULr iv 


1115 


1115 

i^ji 


OV CTUTT AOTtiV 

Bi blMILAKin, 


FT 


ntcrtt pm 
UloULr ID 


1111 


1244 


BY SIMILARITY. 


FT 


nTcrn PTn 
UloULr iv 


1246 


1259 


OV CTUTTAOTTV 

DI blMILMIli, 


FT 


DISULFID 


1538 


1551 


BY SIMILARITY. 


FT 


nTcnT PTn 
UloULr ID 




1561 


OV CTUTT ROTT1V 

Bl blMlLAKIil. 


FT 


DISULFID 


1563 


1576 


BY SIMILARITY. 


FT 


DISULFID 


1846 


1857 


BY SIMILARITY, 


FT 


nTOtTT PTn 
DlbULHU 


1853 


1867 


BY SIMILARITY, 


FT 


DISULFID 


1869 


1882 


BY SIMILARITY. 




nTcrtr PTn 
UloULr lu 






DV CTUTT iDTTV 

Di blMlLAKIil, 


FT 


DISULFID 


2162 


2176 


BY SIMILARITY. 


FT 


DISULFID 


2178 


2190 


BY SIMILARITY, 


FT 


DISULFID 


2476 


2487 


BY SIMILARITY. 


FT 


DISULFID 


2483 


2497 


BY SIMILARITY. 


FT 


DISULFID 


2499 


2511 


BY SIMILARITY. 


FT 


DISULFID 


2518 


2531 


BY SIMILARITY. 


FT 


DISULFID 


2526 


2544 


BY SIMILARITY, 


FT 


DISULFID 


2538 


2555 


BY SIMILARITY. 


FT 


DISULFID 


2560 


2572 


BY SIMILARITY. 
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FT 


DISULFID 


2567 


2585 


BY SIMILARITY. 


FT 


DISULFID 


2579 


2594 


BY SIMILARITY. 


FT 


DISULFID 


2599 


2611 


BY SIMILARITY. 


FT 


DISULFID 


2606 


2624 


BY SIMILARITY. 


FT 


DISULFID 


2618 


2633 


BY SIMILARITY. 


FT 


DISULFID 


2638 


2660 


BY SIMILARITY. 


FT 


DISULFID 


2654 


2673 


BY SIMILARITY. 


FT 


DISULFID 


2667 


2682 


BY SIMILARITY. 


FT 


DISULFID 


2690 


2702 


BY SIMILARITY. 


FT 


DISULFID 


2697 


2715 


BY SIMILARITY. 


FT 


DISULFID 


2709 


2724 


BY SIMILARITY. 


FT 


DISULFID 


2732 


2744 


BY SIMILARITY. 


FT 


DISULFID 


2739 


2757 


BY SIMILARITY. 


FT 


DISULFID 


2751 


2767 


BY SIMILARITY. 


FT 


DISULFID 


2772 


2785 


BY SIMILARITY. 


FT 


DISULFID 


2779 


2798 


BY SIMILARITY. 




DISULFID 


2792 


2810 


BY SIMILARITY. 



■e: remainder of annotations omitted. 

Query Match 12.9*; Score 105; DB 1; Length 4543; 

Best Local Similarity 34.3*; Pred. No. 1.16e-05; 

Matches 23; Conservative 10; Mismatches 29; Indels 5; Gaps 5; 

Db 4317 COMSRDGVKQCRCPPQFEGAOCQ-DNKCSRCQEGKCNINROSGDVSCICPDGKIAP-SCL 4374 

:l I I II I :|| :| I I : I II I II :|:: I 
Qy 2 CHISDQGEPYCLCQPGFSGEHCQQENPCLG-QWREVIRROKGYASCA-TASKVPIMECR 59 

Db 4375 T-CDSYC 4380 
I : I 

Qy 60 GGCGPQC 66 



Search completed: Fri May 28 09:00:04 1999 
Job time : 16 sees. 
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Release 3.1a John F, Collins, Biocomputlng Research Unit, 
copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

'srch_pp protein - protein database search, using Smith-Waterman algorithm 



Fri May 28 09:00:22 1999; MasPar time 9 

568.587 Million cell updates/sec 

Tabular output not generated, 

Title: >US-09-191-647-6 

Description: (1-103) from US09191647, pep 

Perfect Score: 813 

Sequence: 1 QCHISDQGEPYCLCQPGFSG GSSFVEEVERHLECGCLACS 103 

Scoring table: PAM 150 
Gap 11 

Searched: 179066 seqs, 54579741 residues 

Post-processing: Minimum Match 0* 

Listing first 45 summaries 

Database: sptrembl9 

l:sp_archea 2:sp_bacteria 3:sp_fungi 4 : spjiuman 
5:sp_invertebrate 6:spjnammal 7 : spjnhc 8:sp_organelle 
9:sp_phage 10:sp_plant llispjrodent 12:sp_unclassified 
13 : sp_vertebrate 14:sp_virus 

Statistics: Mean 36.755; Variance 55.164; scale 0.666 

Pred. No. is the number of results predicted by chance to have a 
A score greater than or equal to the score of the result being printed, 
■ and is derived by analysis of the total score distribution. 



I 

Query 



NO. 


Score 


Match Length 


DB 


ID 


Description 


Pred, No, 


1 


813 


100.0 


739 


4 


075094 


MEGF5 (FRAGMENT) , 


1.406-189 


2 


740 


91.0 


1523 


11 


088280 


MEGF5. 


2.05e-169 


3 


413 


50.8 


1531 


11 


088279 


MEGF4. 


1.01e-80 


4 


307 


37.8 


79 


4 


075093 


MEGF4 (FRAGMENT), 


4.61e-53 


5 


176 


•21.6 


601 


5 


Q20204 


F40E10.4 PROTEIN (FRAG 


8.68e-21 


6 


121 


14.9 


367 


11 


054775 


ELM1, 


1.21e-08 


7 


120 


14.8 


1722 


5 


Q19350 


SIMILAR TO EGF-LIKE RE 


1.94e-08 


' 8 


116 


14.3 


381 


4 


043775 


CYR61 PROTEIN. 


1.28e-07 


9 


116 


14.3 


955 


4 


Q99466 


NOTCH4 (FRAGMENT). 


1.28e-07 


10 


116 


14.3 


1999 


4 


Q99940 


N0TCH4 , 


1.28e-07 


11 


116 


14.3 


2003 


4 


000306 


NOTCH4 . 


1.28e-07 


12 


112 


13.8 


1687 


11 


Q61204 


NOTCH2-LIKE (EGF REPEA 


8.16e-07 


13 


112 


13.8 


2470 


11 


035516 


CELL SURFACE PROTEIN, 


8.16e-07 


14 


111 


13,7 


2352 


5 


061240 


HRNOTCH PROTEIN, 


1.29e-06 


15 


110 


13.5 


1964 


11 


035442 


N0TCH4 , 


2.04e-06 


16 


110 


13.5 


2447 


13 


013149 ■ 


NOTCH 2 (FRAGMENT) , 


2.04e-06 


17 


109 


13.4 


434 


11 


055139 


JAGGED2 PROTEIN (FRAGM 


3.22e-06 


18 


109 


13.4 


518 


11 


070219 


JAGGED 2 (JAGGED 2 PRO 


3.22e-06 


19 


109 


13.4 


1202 


11 


P97607 


JAGGED2 (FRAGMENT). 


3.22e-06 


20 


108 


13.3 


1203 


11 


Q06008 


NOTCH PROTEIN HOMOLOG 


5.07e-06 



21 


107 


13.2 


530 5 


Q24526 


SLIT LOCOS ENCODING A 


7.97e 


06 


22 


103 


12.7 


308 6 


046370 


PREADIPOCYTE FACTOR- 1. 


4.77e 


05 


23 


103 


12.7 


802 13 


057462 


DELTAA. 


4.77e 


05 


24 


103 


12.7 


1212 13 


042347 


C- SERRATE -2 (FRAGMENT) 


4.77e 


05 


25 


103 


12.7 


2653 5 


Q25253 


NOTCH HOMOLOG SCALLOPE 


4,77e 


05 


26 


102 


12.5 


387 11 


Q06007 


NOTCH PROTEIN HOMOLOG 


7.42e 


05 


27 


102 


12.5 


406 5 


Q25059 


FIBROPELLIN III (FRAGM 


7.42e 


05 


28 


101 


12.4 


717 13 


P87357 


DELTAD TRANSMEMBRANE P 


1.15e 


04 


29 


101 


12.4 


1042 4 


Q13792 


APOMUCIN (FRAGMENT) , 


1.15e 


04 


30 


101 


12.4 


1081 4 


076065 


MUC5AC PROTEIN (FRAGME 


1.15e 


04 


31 


101 


12.4 


2180 5 


001768 


SIMILARITY TO EGF-LIKE 


1.15e 


04 


32 


101 


12.4 


2531 5 


016004 


NOTCH HOMOLOG, 


l.l5e 


04 


33 


100 


12.3 


752 13 


042374 


NOTCH RECEPTOR PROTEIN 


1.78e 


04 


34 


99 


12.2 


263 4 


Q99740 


SOLUBLE PROTEIN JAGGED 


2.76e 


04 


35 


99 


12.2 


529 5 


025058 


FIBROPELLIN IA (FRAGME 


2.76e 


04 


36 


99 


12.2 


728 13 


090656 


TRANSMEMBRANE PROTEIN 


2.76e 


04 


37 


99 


12.2 


1218 4 


014902 


TRANSMEMBRANE PROTEIN 


2.76e 


04 


38 


99 


12.2 


1218 4 


015122 


JAGGED1. 


2.76e 


04 


39 


99 


12.2 


1218 4 


015816 


TRANSMEMBRANE PROTEIN 


2.76e 


04 


40 


99 


12.2 


1219 11 


063722 


JAGGED PROTEIN. 


2.76e 


04 


41 


99 


12.2 


1227 4 


P78504 


JAGGED 1 (TRANSMEMBRAN 


2.76e 


04 


42 


98 


12.1 


597 11 


035727 


FACTOR XII. 


4.25e 


04 


43 


98 


12.1 


762 13 


042373 


NOTCH RECEPTOR PROTEIN 


4,25e 


04 


44 


98 


12.1 


1193 13 


Q90819 


C-SERATE-1 PROTEIN (FR 


4.25e-04 


45 


98 


12.1 


1372 5 


P91526 


SIMILARITY TO MULTIPLE 


4.25e-04 



ALIGNMENTS 

ILT 1 

075094 PRELIMINARY; PRT; 739 AA, 
075094; 

01-NOV-1998 (TREMBLREL. 08, CREATED) 

01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

MEGF5 (FRAGMENT) . 

MEGF5. 

HOMO SAPIENS (HUMAN) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

CATARRHINI; HOMINIDAE; HOMO. 

[1] 

SEQUENCE FROM N.A, 
TISSUE=BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M, , NAKAJIMA D., NAGASE T., NOMURA N., SEKI N. , OHARA O.; 

"Identification of high-molecular -weight proteins with multiple 

EGF-like motifs by motif -trap screening."; 

GENOMICS 51:27-34(1998). 

EMBL; AB011538; D1033429; -. 

PROSITE; PS01185; CTCK.1; 1. 

PROSITE; PS01186; EGF.2; 7. 

PROSITE; PS01187; EGF_CA; 2. 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

NON.TER 1 1 

SEQUENCE 739 AA; 80364 MW; DC6BCB63 CRC32; 



Query Match 100.0%; 
Best Local Similarity 100.0%; 
Matches 103; Conservative 



Score 813; DB 4; Length 739; 
Pred. No. 1.40e-189; 
0; Mismatches 0; Indels 



Gaps 0; 

Db 637 QCHISDQGEPYCLCOPGFSGEHCQQENPCLGQWREVIRRQKGYASCATASKVPIMECRG 696 

IIIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIII 
Qy 1 QCHISDQGEPYCLCQPGFSGEHCQQENPCLGQWREVIRRQKGYASCATASKVPIMECRG 60 

Db 697 GCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS 739 

iiiiimiiiiiiiiiimmimmmimiiiii 

Qy 61 GCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS 103 



RESULT 2 

ID 088280 PRELIMINARY; PRT; 1523 AA. 
AC 088280; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 
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DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF5. 

GN MEGF5 . 

OS RATTUS NORVEGICUS (RAT) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N. A, 

RC STRAIN=SPRAGUE-DAWLEY; TISSUE=BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T, , NOMURA N., SEKI N. , OHARA O.; 

RT "Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif -trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; M011531; D1033424; ■, 

DR PROSITE; PS01185; CTCKJ; 1, 

DR PROSITE; PS01186; EGF 2; 7. 

DR PROSITE; PS01187; EGF_CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

Ii SEQUENCE 1523 AA; 167767 MW; 2BD845D0 CRC32; 
Query Match 91,0*; Score 740; DB 11; Length 1523; 

Best Local Similarity 88,3%; Pred. No. 2.05e-169; 

Matches 91; Conservative 6; Mismatches 6; Indels 0; Gaps 0; 

Db 1421 QCHISDRGEPYCLCQPGFSGNHCEQENPCLGEIVREAIRRQKDYASCATASKVPIMVCRG 1480 

IMI|:||!IIIIIIMI|:|I:|IIIIH::|| Mill 1 1 1 1 1 1 1 1 1 1 1 1 1 III 
Qy 1 QCHISDQGEPYCLCQPGFSGEHCQQENPCLGQWREVIRRQKGYASCATASKVPIMECRG 60 

Db 1481 GCGSQCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCRECS 1523 

Mhllll IIIIIIMIIIIIIIIIIIIIIIIIIIH || 
Qy 61 GCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS 103 



RESULT 3 

ID 088279 PRELIMINARY; PRT; 1531 AA, 

AC 088279; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF4 . 

GN MEGF4 . 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

•STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
MEDLINE; 98360089. 
NAKAYAMA M., NAKAJIMA D. ( NAGASE T., NOMURA N., SEKI N., OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif-trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011530; D1033423; -. 

DR PROSITE; PS01185; CTCKJ; 1. 

DR PROSITE; PS01186; EGF J; 8, 

DR PROSITE; PS01187; EGF CA; 2, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

Query Match 50.8%; Score 413; DB 11; Length 1531; 

Best Local Similarity 50.0%; Pred, No. 1.01e-80; 

Matches 52; Conservative 20; Mismatches 31; Indels 1; Gaps 1; 

Db 1428 HCQASATRGAHCVCSPGFSGELCEQESECRGDPVRDFHRVQRGYAICQTTRPLSWVECRG 1487 

:|: I : hi llllll 1 : 1 1 : I h II: I hill I h :: :|||| 
Qy 1 QCHISDQGEPYCLCQPGFSGEHCQQENPCLGQWREVIRRQKGYASCATASKVPIMECRG 60 

Db 1488 ACPGQGCCQGLRLKRRKLTFECSDGTSFAEEVEKPTKCGCAPCA 1531 

:| I III I Mil hhllMI till: II! :|: 
Qy 61 GC - GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS 103 



RESULT 4 

ID 075093 PRELIMINARY; PRT; 79 AA. 

AC 075093; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE MEGF4 (FRAGMENT), 

GN MEGF4. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T. , NOMURA N., SEKI N. , OHARA O.; 

RT "Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif -trap screening,"; 

RL GENOMICS 51:27-34(1998). 
DR EMBL; AB011537; D1033428; -. 
DR PROSITE; PS01185; CTCKJ; 1, 

FT NONJTER 1 1 

SQ SEQUENCE 79 AA; 8809 MW; 96C95FFE CRC32; 

Query Match 37,8%; Score 307; DB 4; Length 79; 

Best Local Similarity 49,4%; Pred. No. 4.61e-53; 

Matches 39; Conservative 16; Mismatches 23; Indels 1; Gaps 1; 

Db 1 ESECRGDPVRDFHQVQRGYAICQTTRPLSWVECRGSCPGQGCCQGLRLKRRKFTFECSDG 60 

I: I I: II: : hill I h :: :|llhl I III I lllh hhil 
Qy 26 ENPCLGQWREVIRRQKGYASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDG 84 

Db 61 TSFAEEVEKPTKCGCALCA 79 

:M lllh III h 
Qy 85 SSFVEEVERHLECGCLACS 103 



RESULT 5 

ID Q20204 PRELIMINARY; PRT; 601 AA. 

AC Q20204; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE F40E10. 4 PROTEIN (FRAGMENT). 

GN F40E10.4, 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA SMYER.; 

RL SUBMITTED (FEB-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718, 

RA WILSON R., AINSCOUGH R., ANDERSON K. , BAYNES C, BERKS M., 

RA BONFIELD J., BURTON J,, CONNELL M., COPSEY T., COOPER J., COULSON A., 

RA CRAXTQN M., DEAR S., DU Z., DURBIN R. ( FAVELLO A., FULTON L., 

RA GARDNER A., GREEN P., HAWKINS T., HILLIER L., JIER M., JOHNSTON L., 

RA JONES M., KERSHAW J., KIRSTEN J., LAISTER N. , LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R., 

RA SMALDON N,, SMITH A., SONNHAMMER E,, STADEN R., SULSTON J., 

RA THIERRY -MIEG J., THOMAS K., VAUDIN M., VAUGHAN K., WATERSTON R,, 

RA WATSON A,, WEINSTOCK L., WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C, 

RT elegans."; 

RL NATURE 368:32-38(1994). 

DR EMBL; 269792; E1346469; -. 

DR PROSITE; PS01187; EGF.CA; 1. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

FT NONJTER 1 1 
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SQ SEQUENCE 601 AA; 66669 MW; F8A72773 CRC32; 

Query Match 21.64; Score 176; DB 5; Length 601; 

Best Local Similarity 32.6*; Pred. no, 8.68e-21; 

Matches 30; Conservative 20; Mismatches 36; Indels 6; Gaps 6; 

Db 503 YMCQCDSHFSGEHCD- EKR- I • KCDKQKFRRHHIENECRSVDRIKIAECNGYCGGEQNCC 559 

I I I- llllll: |: ■ :: I : :: I II I II Ml 

Qy 11 Y-CLCQPGFSGEHCQQENPCLGQWREVIRRQKGYASCATASKVPIMECRGGCGP-Q-CC 67 

Db 560 T AVKKKQRKVKMI CKNGTTK I ST VH I IRQCQC 591 

: : 1:11 I :|:: : I :| I 
Qy 68 QPTRSKRRKYVFOCTDGSSFVEEVERHLECGC 99 



054775 PRELIMINARY; PRT; 367 AA. 
054775; 

01-JUN-1998 (TREMBLREL. 06, CREATED) 
01-JUN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
ELM1. 
ELM1, 

MUS MUSCULUS (MOUSE) , 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 
SCIUROGNATHI; MURIDAE; MURINAE; MUS. 
tl] 

SEQUENCE FROM N.A, 
STRAIN-HEN; 
MEDLINE; 98119879, 

HASHIMOTO Y, , SHINDO"OKADA N. , TANI M. , NAGAMACHI Y., TAKEUCHI K. , 
SHIROISHI T., TOMA H., YOKOTA J,; 

"Expression of the Elml gene, a novel gene of the CCN (connective 
tissue growth factor, Cyr61/Cefl0, and neuroblastoma overexpressed 
gene) family, suppresses In vivo tumor growth and metastasis of 
K-1735 murine melanoma cells."; 
J, EXP. MED, 187:289-296(1998). 
EMBL; AB004873; D1025874; -. 
DR PROSITE; PS01185; CTCK 1; 1, 
SQ SEQUENCE 367 AA; 40702 MW; 1AB35AB9 CRC32; 

Query Match 14.9%; Score 121; DB 11; Length 367; 

Best Local Similarity 33.3%; Pred. No. 1.21e-08; 

Matches 20; Conservative 12; Mismatches 25; Indels 3; Gaps 3; 



288 AGCVSTRTYRPKYC-GVCTDNRCCIPYKSKTISVDFQCPEGPGFSRQVLWINACFCNLSC 346 

1:1 " : I I I :ll I :ll III :|::| :| : II |:| 
45 ASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC-LAC 102 



RESULT 7 

ID Q19350 PRELIMINARY; PRT; 1722 AA. 

AC Q19350; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE SIMILAR TO EGF-LIRE REPEATS. NCBI GI: 1125776. 

GN F11C7.4. 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M., 

RA BONFIELD J., BURTON J., CONNELL M., COPSEY T., COOPER J., COULSON A,, 

RA CRAXTON M., DEAR S,, DU Z,, DURBIN R,, FAVELLO A., FULTON L,, 

RA GARDNER A., GREEN P., HAWKINS T., HILLIER L,, JIER M. , JOHNSTON L., 

RA JONES M., KERSHAW J., KIRSTEN J,, LAISTER N. , LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMORRAY A., MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A, , SAUNDERS D., SHOWNKEEN R., 

RA SMALDON N., SMITH A., SONNHAMMER E,, STADEN R., SULSTON J,, 



RA THIERRY-MIEG J., THOMAS K. , VAUDIN M. , VAUGHAN K, , WATERSTON R., 

RA WATSON A., WEINSTOCK L., WILKINSON* SPROAT J., WOHLDMAN P.; 

RT "2,2 Mb of contiguous nucleotide sequence from chromosome III of C, 

RT elegans."; 

RL NATURE 368:32-38(1994). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RA TAICH A,, VETTER J.; 

RL SUBMITTED (JAN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U42839; G1125776; -. 

DR PROSITE; PS00010; ASX HYDROXYL; 5. 

DR PROSITE; PS01186; EGF_2; 19. 

DR PROSITE; PS01187; EGF_CA; 3. 

DR PFAM; PF00008; EGF; 24. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1722 AA; 188383 MW; CCFB86B8 CRC32; 



Query Match 14.8%; Score 120; DB 5; Length 1722; 

Best Local Similarity 39.3%; Pred, No. 1.94e-08; 

Matches 11; Conservative 7; Mismatches 10; Indels 0; Gaps 

3b 380 CKLDAEGEPFCVCEEGFDGPFCEPKSGC 407 

I : :|ll:hl: II I I: : I 
}y 2 CHISDQGEPYCLCQPGFSGEHCQQENPC 29 



RESULT 8 

ID 043775 PRELIMINARY; PRT; 381 AA. 

AC 043775; 

DT 01-JUN-1998 (TREMBLREL. 06, CREATED) ■ , 

DT 01-JUN-1998 (TREMBLREL, 06, LAST SEQUENCE UPDATE) 

DT 01-JUN-1998 (TREMBLREL, 06, LAST ANNOTATION UPDATE) 

DE CYR61 PROTEIN, 

GN CYR61. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA MARTINERIE C, VIEGAS-PEQUIGNOT E., NGUYEN V.C., PERBAL B.; 

RL J. CLIN, PATHOL. MOL. PATHOL. 50:130-136(1997). 

DR EMBL; Y11307; E304665; -. 

DR PROSITE; PS00222; IGFJINDING; 1. 

DR PROSITE; PS01185; CTCKJ; 1. 

DR PROSITE; PS01208; VWFC; 1, 

SQ SEQUENCE 381 AA; 42025 MW; 1B18FF1A CRC32; 



Query Match 14.3%; 
Best Local Similarity 33,3%; 
Matches 19; Conservative 



Score 116; DB 4; Length 381; 

Pred. No. 1.28e-07; 

10; Mismatches 26; Indels 2; 



Db 



300 YAGCLSVKKYRPKYC-GSCVDGRCCTPQLTRTVKMRFRCEDGETFSKNVMMIQSCKC 355 
11:1 : I : I hi =11 I :: I hi II :| :| I I 
Qy 44 YASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGC 99 



RESULT 9 

ID Q99466 PRELIMINARY; PRT; 955 AA. 

AC Q99466; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH4 (FRAGMENT), 

GN NOTCH4, 

OS HOMO SAPIENS (HUMAN), 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97311416. 

RA SUGAYA K. , SASANUMA S., NOHATA J., KIMURA T, , FUKAGAWA T., 

RA MAMURA Y., ANDO A,, INOKO H,, IKEMURA T., MITA K.; 
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RT "Gene organization of human N0TCH4 and (CTG)n polymorphism in this 

RT human counterpart gene of mouse proto-oncogene Int3."; 

RL GENE 189:235-244(1997). 

DR EMBL; D86566; D1013803; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 11. 

DR PROSITE; PS01186; BGFJ; 17. 

DR PROSITE; PS01187; EGF.CA; 9. 

DR PFAM; PF0Q008; EGF; 21. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 955 955 

SQ SEQUENCE 955 AA; 100017 MW; 28507B36 CRC32; 

Query Match 14.3%; Score 116; DB 4; Length 955; 

Best Local Similarity 46.9*; Pred. No. 1.28e-07; 

Matches 15; Conservative 5; Mismatches 10; Indels 2; Gaps 2; 

Db 406 QCSTNPLTGSTLCLCQPGYSGPTCHQDLDECL 437 

I! : I lllllhll 1:1: : II 
Oy 1 QCHISD-QGEPYCLCQPGFSGEHCQQE-NPCL 30 

•ULT 10 
Q99940 PRELIMINARY; PRT; 1999 AA. 

AC 099940; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH4. 

GN NOTCH4. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN 11] 

RP SEQUENCE FROM N.A. 

RA LI L., HUANG G., BANTA A., DENG Y., CHEN L., PHAM Q, ( ROWEN L., 

RA HOODL.; 

RL SUBMITTED (FEB-1997) TO EMBL/GENBANR/DDBJ DATA BANKS. 

DR EMBL; U89335; G1841543; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 11. 

DR PROSITE; PS01186; EGF 2; 21. 

DR PROSITE; PS01187; EGF.CA; 9. 

DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00023; ank; 5, 

DR PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1999 AA; 209134 MW; 0680278E CRC32; 

Query Match 14.31; Score 116; DB 4; Length 1999; 

fBest Local Similarity 46.9%; Pred. No. 1.28e-07; 
patches 15; Conservative 5; Mismatches 10; Indels 2; Gaps 2; 
405 QCSTNPLTGSTLCLCQPGYSGPTCHQDLDECL 436 
II : I lllllhll hi: : II 
Qy 1 QCHISD-QGEPYCLCQPGFSGEHCQQE-NPCL 30 



RESULT 11 

ID 000306 PRELIMINARY; PRT; 2003 AA, 

AC 000306; 

DT 01-JUL-1997 (TREMBLREL. 04, CREATED) 

DT 01-JUL-1997 (TREMBLREL. 04, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH4 . 

GN HNOTCH4. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=BONE MARROW, HEART; 

RA LI L,, HUANG G., BANTA A., YU D., ROWEN L, ( HOOD L. ; 

RL SUBMITTED (MAR- 1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U95299; G2072309; •. 



DR PROSITE; PS00010; ASXJYDROXYL; 11. 

DR PROSITE; PS01186; EGF 2; 21. 

DR PROSITE; PS01187; EGF.CA; 9. 

DR PFAM; PF00008; EGF; 26, 

DR PFAM; PF00023; ank; 5. 

DR' PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

SQ SEQUENCE 2003 AA; 209620 MW; 518CFE96 CRC32; 

Query Match 14,3*; Score 116; DB 4; Length 2003; 

Best Local Similarity 46.9%; Pred. No. 1.28e*07; 

Matches 15; Conservative 5; Mismatches 10; Indels 2; Gaps 2; 

Db 406 QCSTNPLTGSTLCLCQPGYSGPTCHQDLDECL 437 

II : I lllllhll hi: : II 
Qy 1 QCHI SD - QGEPYCLCQPGFSGEHCQQE - NPCL 30 



RESULT 12 

ID Q61204 PRELIMINARY; PRT; 1687 AA. 

AC Q61204; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) . 

DE NOTCH2-LIKE (EGF REPEAT TRANSMEMBRANE PROTEIN). 

GN NOTCH2L. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN-C57BL/6J; TISSUE-WHOLE EMBRYO; 

RA SELL C, HOFF III H,B,; 

RL SUBMITTED (MAY-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; U57368; G1336628; -. 

DR MGD; MGI: 1202397; NOTCH2L, 

DR PROSITE; PS00010; ASXJYDROXYL; 2. 

DR PROSITE; PS01187; EGF CA; 2. 

DR PROSITE; PS01186; EGF.2; 5. 

DR PFAM; PF00008; EGF; 7. 

KW TRANSMEMBRANE; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1687 AA; 188528 MW; 73B9DDDC CRC32; 

Query Match 13.8%; Score 112; DB 11; Length 1687; 

Best Local Similarity 40,6%; Pred. No. 8.16e-07; 

Matches 13; Conservative 9; Mismatches 8; Indels 2; Gaps 2; 

Db 350 CHMLSR ■ DT YECTCQVGFTGKQCQWTDACLSH 380 

II: : : I I II Ihl HI ::||:: 
Qy 2 CHISDQGEPY-CLCQPGFSGEHCQQENPCLGQ 32 



RESULT 13 

ID 035516 PRELIMINARY; PRT; 2470 AA, 

AC 035516; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE CELL SURFACE PROTEIN. 

GN NOTCH2. 

OS MUS MUSCULUS (MOUSE), 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57B/6; TISSUE-THYMUS; 

RX MEDLINE; 93178563. 

RA LARDELLI M., LENDAHL U.; 

RT "Notch A and motch B--two mouse Notch homologues coexpressed in a 

RT wide variety of tissues,"; 

RL EXP. CELL RES, 204:364-372(1993). 

RN [2] 

RP SEQUENCE FROM N.A. 
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RC STRAIN-C57B/6; TISSUE-THYMUS; 

RA HAMADA Y., HIGOCHI M, , TSUJIMOTO Y.; 

RL SUBMITTED (JUL-1994) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; D32210; D1022953; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS01186; EGF 2; 27. 

DR PROSITE; PS01187; EGF CA; 22. 

DR PFAM; PF00008; EGF; 34. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF -LIKE DOMAIN. 

SQ SEQUENCE 2470 AA; 265325 MW; CA94E03A CRC32; 

Query Match 13.8*; Score 112; DB 11; Length 2470; 

Best Local Similarity 40.64; Pred. No. 8.16e-07; 

Matches 13; Conservative 9; Mismatches 8; Indels 2; Gaps 2; 

lb 121 CHMLSR-DTYECTCQVGFTGKQCQWTDACLSH 151 
I II: : : I I II Ihl ;|| ::||:: 

Qy 2 CHISDQGEPY-CLCQPGFSGEHCQQENPCLGQ 32 



DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1964 AA; 206699 MW; CE2CA3B6 CRC32; 

Query Match 13.5%; Score 110; DB 11; Length 1964; 

Best Local Similarity 43,8%; Pred. No. 2.04e-06; 

Matches 14; Conservative 7; Mismatches 9; Indels 2; Gaps 2; 

Db 449 C-INTPGSFNCLCLPGYTGSRCEADHNECLSQ 479 

I I: I III I:: :|: : I ||;| 
Qy 2 CHISDQGEPYCLCQPGFSGEHCQQE-NPCLGQ 32 



Search completed: Fri May 28 09:00:54 1999 
Job time : 32 sees, 



RESULT 14 

ID 061240 PRELIMINARY; PRT; 2352 AA. ■ 

AC 061240; 

DT 01-AUG-1998 (TREMBLREL. 07, CREATED) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE HRNOTCH PROTEIN, 

GN HRNOTCH. 

OS HALOCYNTHIA RORETZI (SEA SQUIRT). 

OC EUKARYOTA; METAZOA; CHORDATA; UROCHORDATA; ASCIDIACEA; STOLIDOBRANCHIA; 

OC PYURIDAE; HALOCYNTHIA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA HORI S., SAITOH T., MATSDMOTO M., MAKABE K.W., NISHIDA H.; 

RL DEV. GENES EVOL. 207:371-380(1997). 

DR EMBL; AB001327; D1026501; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 18. 

DR PROSITE; PS01186; EGF_2; 22. 

DR PROSITE; PS01187; EGF.CA; 18. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 2352 AA; 252623 MW; 816976D4 CRC32; 

•Query Match 13.7%; Score 111; DB 5; Length 2352; 

Best Local Similarity 54.8%; Pred. No. 1.29e-06; 
Matches 17; Conservative 5; Mismatches 5; Indels 4; Gaps 4; 

Db 846 QC - LDDVG -SYKCLCLPGFEGNNCQEEVNEC 874 

II : II :| III III |::|hl I I 
Qy 1 QCHISDQGEPY -CLCQPGFSGEHCQQE -NPC 29 



RESULT 15 

ID 035442 PRELIMINARY; PRT; 1964 AA. 

AC 035442; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH4. 

GN NOTCH4 . 

OS MUS MUSCULUS (MOUSE) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RA ROWEN L., MAHAIRAS G., QIN S., AHEARN M.E., DANKERS C, LASKY S., 

RA LORETZ C, SCHMIDT S., TIPTON S., TRAICOFF R., ZACKRONE K., HOOD L.; 

RL SUBMITTED (OCT-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AF030001; G2564947; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 11, 

DR PROSITE; PS01186; EGF.2; 21. 

DR PROSITE; PS01187; EGF CA; 9. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^srch.pp protein - protein database search, using Smith-Waterman algorithm 

Tun on: Fri May 28 09:02:39 1999; MasPar time 8.30 Seconds 

397.227 Million cell updates/sec 

Tabular output not generated, 



Title: 

Description : 
Perfect Score: 
Sequence: 

Scoring table: 



MJS-Q9-19H47-8 

(1-155) from US09191647 . pep 

1092 

1 RNPXICDCNLQWLAQINLQK FINDKSFEKLSKLRELXLND 155 

PAM 150 
Gap 11 

170751 seqs, 21266608 residues 



Post-processing: Minimum Match 0* 

Listing first 45 summaries 



Database: 



t 



itistics: 



1: parti 2:part2 3:part3 4:part4 5:part5 6:part6 7: part? 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13:partl3 
14:partl4 15:partl5 16:partl6 17:partl7 18:partl8 
19:partl9 20:part20 21:part21 22:part22 23:part23 
24:part24 25:part25 26:part26 27:part27 28:part28 
29:part29 30:part30 31:part31 32:part32 33:part33 
34:part34 35:part35 36:part36 37:part37 38:part38 
39:part39 

Mean 30.771; Variance 136.870; scale 0.225 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



esult 


Query 








FT 


No. 


Score Match Length DB 


ID 


Description 


Pred. No. 


FT 
FT 
FT 


1 


469 42.9 1480 5 


R25079 


Drosophila SLIT prote 


4.43e-32 


2 


403 36.9 1534 30 


W46966 


Amino acid sequence o 


5.07e-26 


FT 


3 


197 18.0 369 15 


R87951 


Rat neurotrophic bigl 


l,02e-07 


FT 


4 


193 17.7 139 8 


R42263 


Decor in sequence PT-7 


2.24e-07 


FT 


5 


193 17.7 186 8 


R42264 


Decorin sequence PT-7 


2.24e-07 


FT 


6 


193 17.7 234 8 


R42265 


Decor in sequence PT-7 


2,24e-07 


FT 


7 


193 17.7 280 8 


R42266 


Decorin sequence PT-7 


2,24e-07 


FT 


8 


193 17.7 305 '8 


R42267 


Decorin sequence PT-7 


2.24e-07 


FT 


9 


193 17.7 331 8 


R42260 


Mature decorin PT-65. 


2.24e-07 


FT 


10 


193 17.7 332 15 


R87953 


Bovine neurotrophic b 


2.24e-07 


FT 


11 


193 17.7 342 17 


R89439 


Human recombinant dec 


2.24e-07 


FT 


12 


193 17.7 353 1 


R05160 


Sequence of human bon 


2.24e-07 


PN 


13 


193 17.7 369 15 


R87952 


Human neurotrophic bi 


2.24e-07 


PD 


14 


193 17.7 1388 18 


R89471 


Collagen/decorin fusi 


2.24e-07 


PF 


15 


192 17.6 196 5 


R29102 


Drosophila SLIT prote 


2.72e-07 


PR 


16 


180 16.5 368 1 


R05159 


Sequence of human bon 


2.78e-06 


PA 



17 


161 


14.7 


94 8 


R42262 


Decorin sequence PT-7 


1.05e 


04 


18 


161 


14,7 


1091 27 


W41641 


Sequence used in dete 


1.05e 


04 


19 


142 


13.0 


111 30 


K58846 


Human AS209 1 secrete 


3.66e 


03 


20 


140 


12.8 


433 18 


R98454 


Oligodendrocyte-myeli 


5.30e 


03 


21 


139 


12.7 


345 23 


W09405 


rlllcul ^laUU oycLlllL 


6,37e 


03 


22 


139 


12.7 


560 12 


R71294 


Human glycoprotein V 


6.37e 


03 


23 


136 


12.5 


661 39 


W87556 


B cell surface protei 


l.lOe 


02 


24 


136 


12.5 


661 25 


W28510 


Prrtriiift - <if fifing .1497 


l.lOe 


02 


25 


136 


12.5 


661 28 


W47274 


Human B-cell activati 


l.lOe 


02 


26 


136 


12.5 


4303 17 


R90302 




l.lOe 


02 


27 


134 


12.3 


605 17 


R85888 


WD-4Q domain*contg. i 


1.59e 


02 


28 


132 


12.1 


245 28 


W3917Q 


Human PKD1 protein fr 


2.29e 


02 


29 


132 


12.1 


455 28 


W39171 


Human PKD1 protein fr 


2.29e 


02 


30 


132 


12,1 


784 39 


W86350 


Human DNAX toll -like 


2,29e 


02 


31 


132 


12,1 


784 39 


W90069 


Human TNF-alpha conve 


2.29e 


02 


32 


132 


12.1 


784 29 


W48245 


Human pro-tumour necr 


2.29e 


02 


33 


132 


12.1 


4302 19 


WQ0870 


Polycystic kidney dis 


2.29e 


02 


34 


132 


12.1 


4302 29 


W33396 


Human PKD1 polypeptid 


2,29e 


02 


35 


132 


12.1 


4302 28 


W23830 


Human PKD1 protein 


2.29e 


02 


36 


132 


12.1 


4339 15 


R75916 


Polycystic kidney dis 


2J9e 


02 


37 


132 


12.1 


4339 19 


R87539 


Polycystic kidney dis 


2.29e 


02 


38 


131 


12.0 


390 20 


W06532 


Gonadotropin receptor 


2.74e 


02 


39 


124 


11.4 


695 5 


R27558 


FSHR. 


9.69e 


02 


40 


124 


11.4 


1045 39 


W86354 


Human DNAX toll -like 


9.69e 


02 


41 


123 


11.3 


904 39 


W86351 


Human DNAX toll-like 


1.16e 


01 


42 


122 


11.2 


634 6 


R30520 


N-terminal of LH rece 


1.38e 


01 


43 


122 


11.2 


692 6 


R30503 


N-terminal of LH rece 


1.38e 


01 


44 


122 


11.2 


695 6 


R30524 


N-terminal of LH rece 


1.38e 


01 


45 


122 


11.2 


695 6 


R30506 


N-terminal of LH rece 


1.38e 


01 



RESULT 
ID 
AC 
DT 
DE 
KW 
KW 



1 



R25079 standard; Protein; 1480 AA. 
R25079; 

05-JAN-1993 (first entry) 

Drosophila SLIT protein involved in axon pathway development, 
Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 
embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 
KW midline glial cells; axonogenesis; cell-cell interaction; ss, 
OS Drosophila melanogaster. 
FH Key Location/Qualifiers 
FT peptide 1..36 
ft /label- signal 

FT domain 7 3.. 294 

FT /label- Flank_LRRJlank_l 

FT /note- "mediates adhesive events" 

FT domain 295,. 518 

FT /label- Flank-LRR-FlankJ 

FT /note- "mediates adhesive events" 

FT domain 519.. 714 

FT /label- Flank_LRR_FlankJ 

FT /note- "mediates adhesive events" 

domain 715,. 910 

/label- Flank_LRR.Flank_4 
/note- "mediates adhesive events" 
region 911,. 1150 

/label- Tandem_EGF_like_repeats 
/note- "involved in protein-protein interactions" 
region 1353.. 1393 

/label- 7th_EGFjike_repeat 
/note- "involved in receptor-ligand interactions" 
region 1394.. 1404 

/label- alternative_splice_segment 
/note- "developmental^ regulated" 
region 1405.. 1480 

/label- C-terminal region 

WO9210518-A. 
25-J0N-1992, 
27-NOV-1991; U09055. 
07-DEC-1990; US-624135. 
(OYYA ) UNIV YALE. 
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PI Artavanis-Tsakonas S, Rothberg JM; 

DR DPI; 92-234590/28, 

DR N-PSDB; Q25811. 

PT SLIT protein and sequence elements for treating 

PT neurodegenerative disease ■ useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 1; Page 84-89; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways. The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathf inding . SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

•injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 
sclerosis, diabetes -caused nerve damage, Parkinson's Disease, 
strokes,' epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 

CC claimed as are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SO Sequence 1480 AA; 

Query Match 42.9%; Score 469; DBS; Length 1480; 

Best Local Similarity 42.9%; Pred. No. 4.43e-32; 

67; Conservative 42; Mismatches 44; Indels 3; Gaps 3 



Matches 


Db 


451 


Qy 


1 


Db 


510 


Qy 


60 


Db 


569 


oy 


120 



:|| INN: 



1:11 lllllllll II:::: 



:NI! :| 



:: I :|| II: I I llllll I |: I II :|:|||: |:: 



I :| 1:1 l"l I: ::|| I ::|| I 



RESULT 2 

A W46966 standard; Protein; 1534 aa, 

■ K45965; 

TT 06-JOL-1998 (first entry) 

DE Amino acid sequence of a human slit-like polypeptide. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1..26 

FT /note- "signal peptide" 

FT Protein 27,. 1534 

FT ' /note- "mature protein" 

PN J10087699-A. 

PD 07-APR-1998. 

PF 15-JOL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-267127/24. 

DR N-PSDB; V16978. 

PT Human slit-like protein - useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 31-35; 45pp; Japanese. 

CC The present sequence represents a novel human slit-like protein (the 

CC mature protein is claimed in Claim 1). The slit-like polypeptide is 



CC useful for diagnosis and treatment of brain-specific diseases and 

CC cancers, Antibodies directed against the protein, or its fragments 

CC can also be used for diagnosing cancer, 

SQ Sequence 1534 AA; 

Query Match 36,9%; Score 403; DB 30; Length 1534; 

Best Local Similarity 39.4%; Pred. No. 5.07e-26; 

Matches 63; Conservative 40; Mismatches 49; Indels 8; Gaps 7; 

Db 438 qnpficdcnlkwlad.f-lrtnpietsgarcasprrlankrigqikskkfrcsakeqyfip 496 

:|| llllll III:: I: I llllllll |:|| :|::: : ::||:| : | |:: 
Qy 1 RNPXICDCNLQWLAQINLQKN-IETSGARCEQPKRLRKKRFATLPPNKFKCKGSES-FVS 58 

Db' 497 gtedyqlnsecnsdvvcphkcrceanvvecsslkltkiperipqstaelrlnnneisile 556 

I :| I I :l II: |:|: I I ||: :::| |: |:|| :: 
Qy 59 -M-YA-DS-CFIDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVD 113 

Db 557 atgmfkklthlkkinlsnnkvseiedgafegaasvselhl 596 

: : I :| "MM :: hi :|| : : II I 
Qy 114 LNSNIHVLENLEXLDLSNNHITFINDKSFEKLSKLRELXL 153 



RESULT 3 

ID R87951 standard; Protein; 369 AA. 

AC R87951; 

DT 20-MAR-1996 (first entry) 

DE Rat neurotrophic biglycan, 

KW Biglycan; proteoglycan; chondroitin sulphate; neuron protection; 

KW neurotrophic; central nervous system; CNS; memory loss; dementia; 

KW learning, 

OS Rattus sp. 

FH Key Location/Qualifiers 

FT peptide 1..37 

FT /label- Sigjeptide 

FT region 44.. 60 

FT /label- Hypervariable_region 

PN WO9530432-A1, 

PD 16-NOV-1995. 

PF 09-MAY-1994; E01479. 

PR 09-MAY-1994; WO-E01479. 

PA (BOEF ) BOEHRINGER MANNHEIM GMBH. 

PI Hasenoehrl R, Huston J, Junghans U, Kappler J, Koops A; 

PI . Mueller HW; 

DR WPI; 95-403938/51, 

DR N-PSDB; T08768. 

PT Proteoglycan cpds,, partic. chondroitin sulphate proteoglycans ) - 

PT for maintain structural and function of the CNS and attenuating 

PT memory deficit(s) in the elderly and patients with dementia 

PS Claim 1; Page 44-45; 60pp; English. 

CC Rat biglycan (R87951) is a chondroitin sulphate proteoglycan with 

CC neurotrophic activity for brain neurons. Recombinant biglycan, 

CC obtd. by expression of encoding cDNA (T08768) in eukaryotic host 

CC cells, can be used to enhance the survival and maintain the structure 

CC and function of CNS neurons during normal ageing as well as after 

CC pathological and/or traumatic nervous system damage. It can also 

CC be used to restore function following nervous system lesions and 

CC degenerative diseases, and to improve learning efficiency and memory 

CC in the elderly and in patients with dementia. 

SQ Sequence 369 AA; 

Query Match 18.0%; Score 197; DB 15; Length 369; 

Best Local Similarity 34.8%; Pred. No. 1.02e-07; 

Matches 31; Conservative 23; Mismatches 34; Indels 1; Gaps 1; 

Db 60 fsamcpfgchchlrvvqcsdlglktvpkeispdttlldlqnndiselr-kddfkglqhly 118 

: ::ll I I l:|: 1 1 : 1 : 1 h :| I I hll : : :: |::| 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 119 alvlvnnkiskihekafsplrklqklyis 147 

Mill: l::|:| I II: I :: 

Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 
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RESULT 4 

ID R42263 standard; Protein; 139 AA. 

AC R42263; 

DT 28-APR-1994 (first entry) 

DE Decorin sequence PT-74 (N-terminal to LRR4). 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

KW decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta. 

PN WO9320202-A. 

PD 14-OCT-1993. 

PF 02-APR-1993; U03171. 

PR 03-APR-1992; US-865652. 

PA (LJOL-) LA JOLLA CANCER RES FOUND. 

PI Cardenas J, Craig W, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

DR WPI; 93-336910/42, 

JR N-PSDB; Q50049. 

Active fragments of protein esp. decorin - with cell regulatory 

W factor domain, useful for' inhibiting cell regulatory factor 

tT activity 

PS Claim 10; Page 41; 77pp; English. 

CC Active fragments of decorin (full-length coding sequence Q50046) 

CC were generated by PCR and fused to Maltose Binding Protein, The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp, TGF-beta, and hence for treating 

CC conditions associated with over-activity of the growth factor such 

CC as certain tumours, 

SQ Sequence 139 AA; 

Query Match 17.7%; Score 193; DB 8; Length 139; 

Best Local Similarity 32.6%; Pred. No. 2.24e-07; 

Matches 29; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 22 lgpvcpfrcqchlrvvqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 80 

: "I! MM \-\- W- -\ '\ Mil \-\ I II 

Oy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 81 alilvnnkiskvspgaftplvklerlyls 109 

I I II I: :: :| I II I I: 
Qy 126 XLDLSNNHI T F I NDKSFEK LSKLRELXLN 154 



RESULT 5 

ID R42264 standard; Protein; 186 AA. 

AC R42264; 

A 28-APR-1994 (first entry) 

■ Decorin sequence PT-75 (N-terminal to LRR6). 

leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

RW decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta. 

PN WO9320202-A. 

PD 14-OCT-1993. 

PF 02-APR-1993; 003171. 

PR 03-APR-1992; US-865652. 

PA (LJOL-) LA JOLLA CANCER RES FOUND, 

PI Cardenas J, Craig W, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

DR WPI; 93-336910/42, 

DR N-PSDB; Q50050. 

PT Active fragments of protein esp. decorin - with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 

PS Claim 10; Page 43-44; 77pp; English, 

CC Active fragments of decorin (full-length coding sequence Q50046) 

CC were generated by PCR and fused to Maltose Binding Protein. The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp. TGF-beta, and hence for treating 

CC conditions associated with over -activity of the growth factor such 

CC as certain tumours. 

SQ Sequence 186 AA; 

Query Match 17.7%; Score 193; DB 8; Length 186; 

Best Local Similarity 32.6%; Pred. No. 2.24e-07; 



Matches 29; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 22 lgpvcpfrcqchlrvvqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 80 

: ::|| :|:| |:|: ||: :| :| :| I I |:|: : :::: I II 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 81 alilvnnkiskvspgaftplvklerlyls 109 

Mill: :: :| I III I: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 6 

ID R42265 standard; Protein; 234 AA. 

AC R42265; 

DT 28-APR-1994 (first entry) 

DE Decorin sequence PT-76 (N-terminal to LRR8), 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

KW decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta, 

PN WO9320202-A. 

PD 14-OCT-1993. 

PF 02-APR-1993; U03171. 

PR 03-APR-1992; US-865652. 

PA (LJOL-) LA JOLLA CANCER RES FOUND. 

PI Cardenas J, Craig W, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

DR WPI; 93-336910/42. 

DR N-PSDB; Q50051, 

PT Active fragments of protein esp. decorin - with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 

PS Claim 10; Page 45-46; 77pp; English. 

CC Active fragments of decorin (full-length coding sequence Q50046) 

CC were generated by PCR and fused to Maltose Binding Protein. The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp, TGF-beta, and hence for treating 

CC conditions associated with over-activity of the growth factor such 

CC as certain tumours, 

SQ Sequence 234 AA; 

Query Match 17,7*; Score 193; DB 8; Length 234; 

Best Local Similarity 32.6%; Pred. No. 2.24e-07; 

Matches 29; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 22 lgpvcpfrcqchlrvvqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 80 

: ::H :|:| |:|: lh :| :| :| I I |:|: : :::: I II 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 81 alilvnnkiskvspgaftplvklerlyls 109 

I I II I: :| I II I |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 7 

ID R42266 standard; Protein; 280 AA. 

AC R42266; 

DT 28-APR-1994 (first entry) 

DE Decorin sequence PT-77 (N-terminal to LRR10). 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

KW decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta. 

PN WO9320202-A. 

PD 14-OCT-1993. 

PF 02-APR-1993; U03171, 

PR 03-APR-1992; US-865652. 

PA (LJOL-) LA JOLLA CANCER RES FOUND. 

PI Cardenas J, Craig W, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

DR WPI; 93-336910/42. 

DR N-PSDB; Q50052. 

PT Active fragments of protein esp. decorin - with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 
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PS Claim 10; Page 47-48; 77pp; English, 

CC Active fragments of decorin (full-length coding sequence 050046) 

CC were generated by PGR and fused to Maltose Binding Protein. The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp. TGF-beta, and hence for treating 

CC conditions associated with over-activity of the growth factor such 

CC as certain tumours. 

SQ Sequence 280 AA; 

Query Match 17,7*; Score 193; DB 8; Length 280; 

Best Local Similarity 32.64; Pred. No. 2,24e-07; 

Matches 29; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 22 lgpvcpfrcqchlrvvqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 80 

: -II :hl hh II: :| :| :| I I |:|: : :::: I || 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 81 alilvnnkiskvspgaftplvklerlyls 109 

I I II I: :: :| III I |: 
Oy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



?0LT 8 
R42267 standard; Protein; 305 AA, 

AC R42267; 

DT 28-APR-1994 (first entry) 

DE Decorin sequence PT-78 (N-terminal to half C-terminal). 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

KW decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta, 

PN WO9320202-A. 

PD 14-OCT-1993. 

PF 02-APR-1993; U03171. 

PR 03-APR-1992; OS-865652, 

PA (LJ0L-) LA JOLLA CANCER RES FOUND. 

pi Cardenas J, Craig w, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

DR DPI; 93-336910/42. 

DR N-PSDB; Q50053. 

PT Active fragments of protein esp. decorin - with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 

PS Claim 10; Page 49-50; 77pp; English, 

CC Active fragments of decorin (full-length coding sequence Q50046) 

CC were generated by PCR and fused to Maltose Binding Protein , The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp. TGF-beta, and hence for treating 

CC conditions associated with over-activity of the growth factor such 

ias certain tumours. 
Sequence 305 AA; 

uuery Match 17.7*; Score 193; DB 8; Length 305; 

Best Local Similarity 32.6%; Pred, No. 2.24e-07; 

Matches 29; Conservative 23; Mismatches 36; indels 1; Gaps 1; 

Db 22 lgpvcpfrcqchlrvvqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 80 

: ::H :|:| |:|: ||: :| :| :| I I |:|: : :::: | || 
Qy 66 IDS IC PTQCDC YGTTVDC NKRGLNTI PTSI PRFATQLLLSGNNISTVDLNSNI HVLENLE 125 

Db 81 alilvnnkiskvspgaftplvklerlyls 109 

I I II h :: :| I III |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESDLT 9 

ID R42260 standard; Protein; 331 AA, 

AC R42260; 

DT 28-APR-1994 (first entry) 

DE Mature decorin PT-65. 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

KW decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta. 

FH Key Location/Qualifiers 



FT region 1. .45 

FT /label- N-terminal.region 

FT /note- "contains 4 Cys residues" 

FT region 46.. 280 

FT /label- repeatjregion 

FT /note- "contains 10 leucine-rich repeats" 

FT region 281.. 331 

FT /label- C-terminal_region 



PN WO9320202-A. 

PD 14-OCT-1993, 

PF 02-APR-1993; D03171. 

PR 03-APR-1992; OS-865652, 

PA (LJ0L-) LA JOLLA CANCER RES FOUND. 

PI Cardenas J, Craig W, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

DR WPI; 93-336910/42. 

DR N-PSDB; 050046. 

PT Active fragments of protein esp. decorin - with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 

PS Claim 10; Page 36-38; 77pp; English, 

CC Active fragments of decorin (full-length coding sequence 050046) 

CC were generated by PCR and fused to Maltose Binding Protein, The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp. TGF-beta, and hence for treating 

CC conditions associated with over-activity of the growth factor such 

CC as certain tumours. 

SO Sequence 331 AA; 

Query Match 17,7*; Score 193; DB 8; Length 331; 

Best Local Similarity 32.6*; Pred, No. 2.24e-07; 

Matches 29; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 22 lgpvcpfrcqchlrwqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 80 

: ::H :|:| |:|: lh :| :| :| I I |:|: : I || 

Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 81 alilvnnkiskvspgaftplvklerlyls 109 

Mill: :: :| I II I |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESDLT 10 

ID R87953 standard; Protein; 332 AA, 

AC R87953; 

DT 20-MAR-1996 (first entry) 

DE Bovine neurotrophic biglycan, 

KW Biglycan; proteoglycan; chondroitin sulphate; neuron protection; 

KW neurotrophic; central nervous system; CNS; memory loss; dementia; 

KW learning. 

OS Bos taurus. 

FH Key Location/Qualifiers 

FT. region 7.. 23 

FT /label- Hypervariablejregion 

PN WO9530432-A1. 

PD 16-NOV-1995. 

PF 09-MAY-1994; E01479. 

PR 09-MAY-1994; WO-E01479. 

PA (BOEF ) BOEHRINGER MANNHEIM GMBH, 

PI Hasenoehrl R, Huston J, Junghans 0, Kappler J, Koops A; 

PI Mueller HW; 

DR WPI; 95-403938/51. 

PT Proteoglycan cpds., partic. chondroitin sulphate proteoglycan (s) - 

PT for maintain structural and function of the CNS and attenuating 

PT memory deficit(s) in the elderly and patients with dementia 

PS Claim 3; Fig 8; 60pp; English, 

CC Bovine biglycan (R87953) is a chondroitin sulphate proteoglycan with 

CC neurotrophic activity for brain neurons. It can be used to enhance 

CC the survival and maintain the structure and function of CNS neurons 

CC during normal ageing as veil as after pathological and/or traumatic 

CC nervous system damage, It can also be used to restore function 

CC following nervous system lesions and degenerative diseases, and to 

CC improve learning efficiency and memory in the elderly and in patients 
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CC with dementia. 
SQ Sequence 332 AA; 

Query Match 17.74; Score 193; DB 15; Length 332; 

Best Local Similarity 34.54; Pred. No. 2.24e-07; 

Matches 30; Conservative 23; Mismatches 33; Indels 1; Gaps 1; 

Db 25 amcpfgchchlrvvqcsdlglkavpkeispdttlldlqnndiselr-kddfkglqhlyal 83 

::M I I 1:1: ||:::| |: :| I I |:|| : : :: |::| | 
Qy 68 SICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXL 127 

Db 84 vlvnnkiskihekafsplrklqklyis 110 

I II I: l::hl I II: I :: 
Oy 128 DLSNNHITFINDKSFEKLSKLRELXLN 154 



^RESULT 11 

A R89439 standard; Protein; 342 AA. 

H R89439; 

TT 20-AOG-1996 (first entry) 

DE Human recombinant decorin. 

KW Decorin; PG-II; PG-40; proteoglycan; guanidinium ion. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT peptide 1 . . 14 

FT /label- Sig_peptide 

PN WO9601842-A1. 

PD 25-JAN-1996. 

PP 07-JUL-1995; O08542. 

PR 08-JUL-1994; (JS-272919. 

PA (LJOL-) LA JOLLA CANCER RES FOUND. 

PI Craig WS, Harper JR, Hernandez SD, Kostel PJ, Parker JR; 

PI Vedvick IS; 

DR WPI; 96-097586/10. 

DR N-PSDB; T10741. 

PT Purificn, of human recombinant decorin • using a strong anion 

PT exchange resin, a hydrophobic interaction chromatography resin and a 

PT strong anion exchange resin 

PS Disclosure; Fig 1A-D; 55pp; English. 

CC Human recombinant decorin (R89439) was obtd. by expression of a 

CC cDNA clone (T10741) in CHO host cells, Decorin (or PGII or PG-40) 

CC is a proteoglycan having a 40 kDa core protein,' Recombinant 

CC decorin can be produced by cotransfection of CHO-DG44 cells with 

CC pSV2-decorin and pSV2dhfr, Large-scale cultures can be performed 

CC using CHO cells attached to microcarrier beads. The recombinant 

•protein is purified from the cells using a 3 -step chromatographic 
procedure. It can be used for the highly sensitive detection of 
guanidinium ions (ppm range), partic, in protein-contg. solns. 

CC purified using GuHCl, and also has therapeutic applns, 

SQ Sequence 342 AA; 

Query Match 17.74; Score 193; DB 17; Length 342; 

Best Local Similarity 32.64; Pred. No. 2.24e-07; 

Matches 29; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 34 lgpvcpfrcqchlrwqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 92 

: ::H :|:| hi: ||: :| :| :| I I |:|: : :::: | || 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVTMSNIHVLENLE 125 



Db 93 alilvnnkiskvspgaftplvklerlyls 121 

I I II I: :: :| I II I |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



12 



ID R05160 standard; protein; 353 AA. 

AC R05160; 

DT 09-OCM990 (first entry) 

DE Sequence of human bone proteoglycan II (decorin). 

KW Osteoporosis; rheumatoid arthritis; Paget's disease; 

KW atherosclerosis; periodontal; human bone matrix; proteoglycan. 

OS Homo sapiens. 

PN OS7432044-A. 



PD 17-APR-1990, 

PF 3-NOV-1989; 432044. 

PR 3-NOV-1989; US-432044. 

PA (OSSH) Nat Inst of Health. 

PI Termine J; 

DR WPI; 90-178641/23. 

DR N-PSDB; Q04491. 

PT Human bone matrix DNA and proteins - 

PT used in detection, diagnosis and treatment involving skeletal 

PT and/or connective tissue disease states. 

PS Disclosure; p; English, 

CC Probes and Abs raised to the proteins can be used to determine 

CC their levels useful in diagnosis of associated conective tissue 

CC diseases states such as osteoporosis, osteo/rheumatoid arthritis, 

CC Paget's disease, artherosclerosis and periodontal disease. 

CC Proteins may also be used to induce or block biological function, 

SQ Sequence 353 AA; 

Query Match 17.7%; Score 193; DB 1; Length 353; . 

Best Local Similarity 32.6%; Pred, No. 2.24e-07; 

Matches 29; Conservative 23; Mismatches 36; Indels 1; Gap 



1; 



Db 44 lgpvcpfrcqchlrvvqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 102 

: ::M :|:| |:|: ||: :| :| :| I I |:|: : :::: | || 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 103 alilvnnkiskvspgaftplvklerlyls 131 

I I II I: :: :| I III I: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 13 

ID R87952 standard; Protein; 369 AA. 

AC R87952; 

DT 20-MAR-1996 (first entry) 

DE Human neurotrophic biglycan. 

KW Biglycan; proteoglycan; chondroitin sulphate; neuron protection; 

KW neurotrophic; central nervous system; CNS; memory loss; dementia; 

KW learning. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT peptide 1..37 

FT /label- Sig_peptide 

FT region 44.. 60 

FT /label- Hypervar iable.reg ion 

PN WO9530432-A1. 

PD 16-NOV-1995. 

PF 09-MAY-1994; E01479. 

PR 09-MAY-1994; WO-E01479. 

PA (BOEF ) BOEHRINGER MANNHEIM GMBH, 

PI Hasenoehrl R, Huston J, Junghans U, Kappler J, Koops A; 

PI Mueller HW; 

DR WPI; 95-403938/51. 

PT Proteoglycan cpds., partic. chondroitin sulphate proteoglycan (s) - 

PT for maintain structural and function of the CNS and attenuating 

PT memory deficit(s) in the elderly and patients with dementia 

PS Claim 3; Fig 8; 60pp; English. 

CC Human biglycan (R87952) is a chondroitin sulphate proteoglycan with 

CC neurotrophic activity for brain neurons. It can be used to enhance 

CC the survival and maintain the structure and function of CNS neurons 

CC during normal ageing as well as after pathological and/or traumatic 

CC nervous system damage. It can also be used to restore function 

CC following nervous system lesions and degenerative diseases, and to 

CC improve learning efficiency and memory in the elderly and in patients 

CC with dementia. 

SQ Sequence 369 AA; 

Query Match 17.74; Score 193; DB 15; Length 369; 

Best Local Similarity 34,54; Pred. No. 2,24e-07; 

Matches 30; Conservative 23; Mismatches 33; Indels 1; Gaps 1; 

Db 62 amcpfgchchlrvvqcsdlglksvpkeispdttlldlqnndiselr-kddfkglqhlyal 120 
• ::ll I I 1:1: l|:::| |: :| I I |:|| : : :: |::| | 
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Qy 68 SIC PTQCDCYGTT VDC NKRGLNT I PTS I PRFATQLLLSGM STVDLNSNI HVLENLEXL 127 

Db 121 vlvnnkiskihekafsplrklqklyis 147 

I II I: I::hl I II: I :: 
Qy 128 DLSNNHIT F I NDKS FEKLS KLRELXLN 154 



RESULT 14 

ID R89471 standard; Protein; 1388 AA. 

AC R89471; 

DT 01-OCT-1996 (first entry) 

DE Collagen/decor in fusion protein. 

KW Transforming growth factor; TGF-beta-1; collagen IA; osteogenesis; 

KW bone formation; tissue repair; fusion protein. 

OS Synthetic. 

FH Key Location/Qualifiers 

FT domain 1..1057 

FT /label- Collagen -IA 

FT /note- "collagen IA alpha-helical domain" 

FT peptide 1058.. 1059 

•/label- Linker_peptide 
domain 1060.. 1388 

/label- Decorin 

ft miscjifference 887 

FT /note- "unidentified amino acid" 

FT miscjifference 890 

FT /note- "unidentified amino acid" 

PN CA2151547-A. 

PD ll-DEC-1995, 

PF 12-JUN-1995; 151547. 

PR 10-JUN-1994; US-259263. 

PA (USSU ) US SURGICAL CORP. 

PI Espino P, Gruskin EA; 

DR WPI; 96-140144/15. 

DR N-PSDB; T16517. 

PT Chimaeric DNA encoding protein contg, extracellular matrix protein 

PT domain - and cellular regulatory factor domain, partic. useful as 

PT osteogenic agents, also related vectors, transformed cells and 

PT chimaeric proteins. 

PS Disclosure; Fig 7; 59pp; English. 

CC A fusion protein (R89471) comprises the alpha-helical region of 

CC human collagen 1(a) linked to human dermatan sulphate proteoglycan 

CC (decorin), It can be expressed in Escherichia col i transformants 

CC carrying a vector incorporating a chimeric gene (T16517) coding for 

CC the fusion, The decorin binds to type I collagen and thus affects 

CC Elbril formation. It inhibits the cell attachment-promoting 

CC activity of collagen and fibrinogen by binding to such molecules 

CC near their cell binding sites. The collagen moiety provides an 

ML integral substratum or scaffolding for the decorin . The fusion 

I protein acts to reduce scarring of healing tissue. 
Sequence 1388 AA; 

Query Match 17.71; Score 193; DB 18; Length 1388; 

Best Local Similarity 32.64; Pred. No. 2.24e-07; 

Matches 29; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 1079 lgpvcpfrcqchlrvvqcsdlgldkvpkdlppdttlldlqnnkiteik-dgdfknlknlh 1137 

: "II :|:l |:|: II: :l :| :| I I |:|: : :::: I II 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 1138 alilvnnkiskvspgaftplvklerlyls 1166 

II II h :: Mill |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



OS Drosophila melanogaster . 

FH Key Location/Qualifiers 

FT region 1. .32 

FT /label- amino_flanking_region 

FT region 33.. 135 

FT /label- Leucine_rich_repeat_region 

FT region 136.. 196 

FT /label- carboxy.flankingjregion 

PN WO9210518-A. 

PD 25-JUN-1992. 

PF 27-NOV-1991; U09055. 

PR 07-DEC-1990; US-624135. 

PA (UYYA ) DMV YALE. 

PI Artavanis-Tsakonas s, Rothberg JM; 

DR WPI; 92-234590/28. 

PT SLIT protein and sequence elements for treating 

PT neurodegenerative disease • useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 5; Page 95; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the 

CC . concomitant formation of the commisural axon pathways . The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding, SLIT contains 4 Flank-LRR-Flank domains (see R25079) 

CC which mediate adhesive events and this consensus sequence is based 

CC on them. Each of the 4 individual domain sequences is individually 

CC claimed. See also Q25811. 

SQ Sequence 196 AA; 

Query Match 17.6%; Score 192; DB 5; Length 196; 

Best Local Similarity 31.04; Pred. No. 2.72e-07; 

Matches 26; Conservative 4; Mismatches 53; Indels 1; Gaps 1 

Db 1 cpxxcxcxgxxvdcxxxqlxxxpxxxpxdttxxxxxxnxixxlxxxx-fxxlxxlxxlxl 59 

II I I I III II: ||: : I I IN 

Qy 70 CPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXLDL 129 

Db 60 xxnxixxlxxxxfxxlxxlxxlil 83 

II: I I I I I 

Qy 130 SNNHITFINDKSFEKLSKLRELXL 153 



Search completed: Fri May 28 09:03:40 1999 
Job time : 61 sees. 



RESULT 15 

ID R29102 standard; Protein; 196. AA. 

AC R29102; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein Flank-LRR-Flank consensus sequence. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; midline glial cells; 

KW axonogenesis; cell-cell interaction; ss. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

^^srchjp protein ■ protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:03:57 1999; MasPar time 9,06 Seconds 

685.804 Million cell updates/sec 

Tabular output not generated, 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



XJS-09-191-647-8 

(1-155) from US09191647 .pep 

1092 

1 RNPXICDCNLQWLAQINLQK FINDKSFEKLSKLRELXLND 155 

PAM 150 
Gap 11 

122810 seqs, 40068593 residues 



Post -processing; Minimum Match 0% 

Listing first 45 summaries 

Database: pir60 

l:pirl 2:pir2 3;pir3 4:pir4 

Statistics: Mean 43.006; Variance 90.646; scale 0,474 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



No. Score 


Match Length 


DB 


ID 


Description 


Pred. No. 


1 483 


44.2 


1469 


2 


B36665 


slit protein 2 precur 


6.23e-67 


2 483 


44,2 


1480 


2 


A36665 


slit protein 1 precur 


6.23e-67 


3 201 


18,4 


420 


2 


A53531 


oncofetal trophoblast 


3.72e-16 


4 197 


18,0 


369 


2 


S20811 


proteoglycan I - mous 


1.69e-15 


5 197 


18.0 


369 


2 


S32793 


biglycan precursor - 


1.69e-15 


6 195 


17.9 


360 


2 


S06280 


decorin precursor - b 


3.59e-15 


7 195 


17,9 


360 


2 


147020 


decorin - rabbit 


3.59e-15 


8 193 


17.7 


359 


1 


NBHUC8 


decorin precursor - h 


7,61e-15 


9 193 


17.7 


368 


1 


BGHUN 


biglycan precursor - 


7.61e-15 


10 193 


17.7 


369 


2 


S32559 


biglycan precursor • 


7.61e-15 


11 192 


17.6 


357 


2 


S24317 


decorin precursor - c 


l.lle-14 


12 184 


16.8 


322 


2 


S72271 


proteoglycan Lb precu 


2.186-13 


13 184 


16.8 


354 


2 


A55454 


decorin precursor - m 


2.186-13 


14 183 


16.8 


354 


2 


S29145 


decorin precursor - r 


3,16e-13 


15 182 


16.7 


316 


2 


A41781 


proteoglycan -Lb • chi 


4.57e-13 


16 162 


14.8 


208 


2 


JN0638 


platelet glycoprotein 


6.54e-10 


17 161 


14.7 


1091 


2 


A58532 


glial cell membrane g 


9.34e-10 


18 159 


14.6 


206 


1 


NBHUIB 


platelet glycoprotein 


1.90e-09 


19 159 


14.6 


343 


2 


A41748 


lumican precursor • c 


1.90e-09 


20 159 


14,6 


411 


1 


155604 


platelet glycoprotein 


1.90e-09 


21 159 


14,6 


907 


2 


JE0176 


orphan G protein-coup 


1.90e-09 


22 156 


14.3 


1535 


2 


S46224 


peroxidasin - fruit f 


5.48e-09 


23 147 


13.5 


662 


2 


S42799 


garp precursor • huma 


1.26e-07 



24 


144 13.2 


661 


156258 


RP105 - mouse 


3.55e 


25 


142 13.0 


440 


A47530 


ol igodendrocy te -myel i 


7.02e 


26 


140 12.8 


440 


A3 9613 


ol igodendrocy te -myel i 


l.38e 


27 


139 12.7 


230 


146918 


leucine-rich glycopro 


[,94e 


28 


139 12.7 


560 


A60164 




l,94e 


29 


136 12.5 


682 


A43318 


connectin precursor - 


5.32e 


30 


136 12.5 


682 


A49121 


cell-surface molecule 


5,32e 


31 


134 12.3 


605 


A41915 


insulin- like growth f 


L.04e 


32 


132 12.1 


298 


JC4130 


osteoglycin precursor 


2 .oie 


33 


132 12,1 


4302 


A38971 


polycystic kidney dis 


2 . oie 


34 


131 12.0 


1134 : 


A29944 


chaoptin precursor - 


2.80e 


35 


130 11.9 


298 


B35272 


osteoinductive factor 


3.89e 


36 


130 11.9 


299 


A35272 


osteoinductive factor 


3,89e 


37 


130 11.9 


338 


S52284 


lumicon, secretory in 


3,89e 


38 


130 11.9 


380 


S71876 


fibromodulin - chicke 


3.89e 


39 


128 11.7 


342 


A46743 


lumican precursor • b 


7.49e 


40 


128 11.7 


382 


139068 


proline- arginine-ric 


7.49e 


41 


128 11.7 


1115 


S40241 


G protein-coupled rec 


7.49e 


42 


127 11.6 


361 


A53860 


chondroadherin precur 


1.04e 


43 


127 11.6 


603 


JC6128 


insulin-like growth f 


1.04e 


44 


124 11,4 


605 


JC5239 


insulin-like growth f 


2.74e 


45 


123 11.3 


536 


A34901 


lysine carboxypeptida 


3.77e 



RESULT 1 

ENTRY B36665 ♦type complete 

TITLE slit protein 2 precursor - fruit fly (Drosophila 

melanogaster) 

ORGANISM , ♦formal.name Drosophila melanogaster 

DATE 30-Apr-1991 #sequence_revision 30-Apr-1991 itext.change 

16-Dec-1998 
ACCESSIONS B36665 
REFERENCE A36665 

♦authors Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S, 
tjournal Genes Dev. (1990) 4:2169-2187 

ttitle slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
♦cross-references MUID; 91099665 
taccession B36665 

♦♦status preliminary 
♦♦molecule.type mRNA 
ttresidues 1-1469 ftlabel ROT 
♦♦cross -references GB:X53959 
GENETICS 

♦gene FlyBase:sli 

♦♦cross-references FlyBase:FBgn0003425 
CLASSIFICATION fsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha - 2 -glycoprotein repeat 
homology; proteoglycan carboxyl -terminal homology 

FEATURE 



66-91 


♦domain proteoglycan amino-terminal homology ilabel 




PAH1\ 


101-124 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology ilabel LRR1\ 


125-148 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology # label LRR2\ 


149-172 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology Ilabel LRR3\ 


173-196 ' 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology ilabel LRR4\ 


197-220 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology ilabel LRR5\ 


228-272 


♦domain proteoglycan carboxyl -terminal homology ilabel 




PCS1\ 


288-313 


♦domain proteoglycan amino-terminal homology ilabel 




PAH2\ 


323-346 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology ilabel LRR6\ 


347-370 


♦domain leucine-rich alpha-2 -glycoprotein repeat 
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homology * label LRR7\ 
371-394 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR8\ 
395-418 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology f label LRR9\ 
419-442 Idoniain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR10\ 
450-494 tdomain proteoglycan carboxyl- terminal homology tlabel 

PCS2\ 

512-537 tdomain proteoglycan amino-terminal homology flabel 

PAH3\ 

547-571 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LR11\ 
572-595 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LR12\ 
596-619 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR13\ 
620-643 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology tlabel LR14\ 
651-695 tdomain proteoglycan carboxyl -terminal homology tlabel 

•PCS3\ 
708-733 tdomain proteoglycan amino-terminal homology flabel 

PAH4\ 
743-766 tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LR15\ 
767-790 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR16\ 
846-890 tdomain proteoglycan carboxyl -terminal homology flabel 

PCS4\ 

1028-1061 tdomain EGF homology flabel EGF 

SUMMARY tlength 1469 f molecular -weight 164695 tchecksum 8361 



Query Match 44.2*; 
Best Local Similarity 44.2%; 
Matches 69; Conservative 



Score 483; DB 2; Length 1469; 

Pred. No. 6.23e-67; 

40; Mismatches 44; indels 3; 



451 KNPFICDCNLRWLAD-YLHKNPIETSGARCESPKRMHRRRIESLREEKFKCSWGELRMKL 509 
:H lllllhlll: Ml MINIM MM::::: ;| :|||| M : : 
1 RNPXICDCNLOWLAQINLQKN-IETSGARCEQPKRLRRKKFAILPPNKFKCKGSESFVSM 59 

510 SGE-CRMDSDCPAMCHCEGTTVDCTGRRLKEIPRDIPLHTTELLLNDNELGRISSDGLPG 568 

:: I :ll II: I I HUM I M II || :MMM |::: : :; ; 
60 YADSCFIDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIH 119 

569 RLPHLVKLELKRNQLTGIEPNAFEGASHIQELQLGE 604 

I :| hi MM |: ::|| | ::|| | : 
120 VLENLEXLDLSNNHITFINDKSFEKLSKLRELXLND 155 



ORGANISM 
DATE 



ACCESSIONS 



A36665 ttype complete 
slit protein 1 precursor - fruit fly (Drosophila 

melanogaster) 
fformal.name Drosophila melanogaster 
30-Apr-1991 fsequencejrevision 30-Apr-1991 ttext change 

24-Sep-1998 
A36665; S13523 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S,; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains . 
f cross -references MUID: 91099665 
taccession A36665 

ffstatus preliminary 
ffmolecule_type mRNA 
tfresidues 1-1480 tflabel ROT 
tfcross-references GB:X53959; NID:g8614; PID:g8615 
GENETICS 

tgene FlyBaseisli 
ftcross- references FlyBase:FBgn0003425 



fauthors 



tjournal 
ftitle 



CLASSIFICATION fsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl -terminal homology 

KEYWORDS alternative splicing 

FEATURE 

66-91 tdomain proteoglycan amino-terminal homology tlabel 

PAH1\ 

101-124 tdomain leucine-rich al pha - 2 - g lycoprote in repeat 

homology flabel LRR1\ 
125-148 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR2\ 
149-172 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
173-196 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR4\ 
197-220 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
228-272 tdomain proteoglycan carboxyl -terminal homology tlabel 

PCS1\ 

.288-313 tdomain proteoglycan amino-terminal homology flabel 

PAH2\ 

323-346 tdomain leucine-rich alpha - 2 - g 1 y coprote i n repeat 

homology flabel LRR6\ 
347-370 tdomain leucine-rich alpha - 2 - g lycoprote in repeat 

homology tlabel LRR7\ 
371-394 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR8\ 
395-418 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR9\ 
419-442 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR10\ 
450-494 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS2\ 

512-537 tdomain proteoglycan amino-terminal homology tlabel 

PAH3\ 

547-571 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRll\ 
572-595 tdomain leucine-rich alpha - 2 - g lycoprote in repeat 

homology tlabel LR12\ 
596-619 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR13\ 
620-643 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR14\ 
651-695 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS3\ 

708-733 tdomain proteoglycan amino-terminal homology tlabel 

PAH4\ 

743-766 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LR15\ 
767-790 tdomain leucine-rich al pha - 2 -glycoprote in repeat 

homology tlabel LR16\ 
791-814 tdomain leucine-rich al pha - 2 - glycoprote in repeat 

homology flabel LR17\ 
815-838 tdomain leucine-rich. alpha-2-glycoprotein repeat 

homology flabel LR18\ 
846-890 tdomain proteoglycan carboxyl-terminal homology flabel 

PCS4\ 

1028-1061 tdomain EGF homology tlabel EGF 

SUMMARY tlength 1480 fmolecular-weight 165751 tchecksum 900 

Query Match 44.24; 
Best Local Similarity 44.2*; 
69; Conservative 



Matches 


Db 


451 


Qy 


1 


Db 


510 


Qy 


60 


Db 


569 



Score 483; DB 2; Length 1480; 

Pred. No. 6.23e-67; 

40; Mismatches 44; Indels 3; 



Gaps 3; 



M! lllllhlll: 1:11 1 1 1 1 1 1 1 1 1 MM 



I :M lh I I 1 1 1 1 1 1 I I: II II MMIM M 
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I M 1:1 |::| |: ::|| I ::|| I : 
Qy 120 VLENLEXLDLSNNHITFINDKSFEKLSKLRELXLND 155 



RESULT 3 

ENTRY A53531 ttype complete 

TITLE oncofetal trophoblast glycoprotein 5T4 precursor ■ human 

ALTERNATE.NAMES oncofetal antigen 5T4 ■ human 

ORGANISM fformaljiame Homo sapiens #comraon_name man 

DATE 27-Jun-1994 *sequence_revision 27-Jun-1994 ttext change 

10-Sep-1997 
ACCESSIONS A53531; S40087 
REFERENCE A53531 

fauthors Myers, K.A.; Rahi-Saund, V.; Davison, M.D.; Young, J. A,; 
Cheater, A.J.; Stem, P.L. 

fjournal J. Biol. Chem. (1994) 269:9319-9324 

• ftitle Isolation of a cDNA encoding 5T4 oncofetal trophoblast 
glycoprotein. An antigen associated with metastasis 
contains leucine-rich repeats, 
tcross-references MUID:94179356 
faccession A53531 

tistatus preliminary 
tfmolecule.type mRNA 
ttresidues 1-420 ftlabel MEY 
ttcross-references EMBL:Z29083; NID:g435654; PID;g435655 
CLASSIFICATION tsuperfamily leucine-rich alpha- 2 -glycoprotein repeat 
homology 

KEYWORDS duplication; glycoprotein; transmembrane protein 

FEATURE 

1-31 tdomain signal sequence fstatus predicted flabel SIG\ 

32-420 tproduct oncofetal trophoblast glycoprotein 5T4 fstatus 

predicted flabel MAT 
SUMMARY flength 420 tmolecular -weight 46031 fchecksum 8580 

Query Match 18.41; score 201; DB 2; Length 420; 

Best Local Similarity 31.5%; Pred. No. 3.72e-16; 

Matches 28; Conservative 25; Mismatches 32; Indels 4; Gaps 4; 

Db 62 CPALCECSEAARTVKCVNRNLTEVPTDLPAYVRNLFLTGNQLAVLPAGAFARRPPLAELA 121 

II: hi :: II I :| I :ll :l : hhll :: : : : I :| 
Qy 70 CPTQCDCY-GT-TVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNS-NIHV-LENLE 125 



Db 122 ALNLSGSRLDEVRAGAFEHLPSLRQLDLS 150 
1 = 11 ::: : :|| h Ihl h 
126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



Tes 



1ESOLT 4 

ENTRY S20811 ttype complete 

TITLE proteoglycan I - mouse 

ALTERNATEJAMES biglycan 

ORGANISM fformaljiame Mus musculus tcommonjame house mouse 
DATE 20-Feb-1995 tsequencejrevision 20-Feb-1995 ttext_change 

05-DGC-1997 
ACCESSIONS S20811; A57645; 149534 
REFERENCE S20811 

fauthors Naitoh, Y.; Suzuki, S. 

♦submission submitted to the EMBL Data Library, July 1990 

f description Nucleotide sequences of cDNAs encoding mouse PGI and PGII. 

faccession S20811 

* fstatus preliminary 
tfmolecule.type mRNA 
tfresidues 1-369 Mabel NAI 
M cross -references EMBL:X53928; NID:g53666; PID;g53667 
REFERENCE A57645 

fauthors Wegrowski, Y. ; Pillarisetti, J.; Danielson, K.G.; Suzuki, S.; 

Iozzo, R.V. 
fjournal Genomics (1995) 30:8-17 

f title The murine biglycan: complete cDNA cloning, genomic 
organization, promoter function, and expression, 
tcross-references MUID: 96129295 
faccession A57645 
■ f fstatus preliminary 



fauthors 
fjournal 
ftitle 



ttmolecule_type mRNA 

ttresidues 1-67, 'W ,69-369 tflabel KEG 
ttcross-references GB:L20276; NID:g348961; PID:g348962 
ttnote authors translated the codon TGG for residue 58 as Cys 

149534 

Rau, W.; Just, W.; Vetter, a.; Vogel, W. 
Mamm. Genome (1994) 5:395-396 

A dinucleotide repeat in the mouse biglycan gene (EST) on the 
X chromosome, 
f cross-references MUID; 94319093 
faccession 149534 

ftstatus preliminary; translated from GB/EMBL/DDBJ 
ftmolecule.type mRNA 

ttresidues 1-67, 'W ,69-369 tflabel RES 
f tcross-references GB:L20276; NID:g348961; PID:g348962 
GENETICS 

tgene Bgn 

CLASSIFICATION tsuperfamily decorin; leucine-rich alpha - 2 - g ly coprote in 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl-terminal homology 
KEYWORDS chondroitin sulfate proteoglycan; dermatan sulfate; 

extracellular matrix; glycoprotein 



FEATURE 
58-82 
92-115 



tdomain proteoglycan amino-terminal homology flabel PAH\ 
tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology flabel LRR1\ 
tdomain leucine-rich alpha - 2 - glycoprote in repeat 

homology tlabel LRR2\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology flabel LRR4\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR5\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology flabel LRR7\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology flabel LRR8\ 
tdomain leucine-rich alpha-2 - glycoprotein repeat 

homology flabel LRR9\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology fstatus atypical tlabel LR10\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCH 

tlength 369 tmolecular -weight 41639 fchecksum 3586 



Query Match 18.0%; Score 197; DB 2; Length 369; 

Best Local Similarity 34.8%; Pred, No. 1.69e-15; 

Matches 31; Conservative 23; Mismatches 34; Indels 1; Gaps 1; 

Db 60 FSAMCPFGCHCHLRWQCSDLGLKTVPKEISPDTTLLDLQNNDISELR-KDDFKGLQHLY 118 

= ::H I I hh Ihhl h :| I I 1 = 11 : : :: |::| 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 



119 ALVLVNNRISKIHEKAFSPLRKLQKLYIS 147 

I I II h h:|:| I II: I :: 
126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 5 

ENTRY S32793 ttype complete 

TITLE biglycan precursor - rat 

ALTERNATEJAMES dermatan sulfate proteoglycan I (DS-PGI) ; proteoglycan I core 
protein (PG-I) 

ORGANISM I formal jame Rattus norvegicus tcoinonjiame Norway rat 
DATE 02-Dec-1993 tsequence_revision 01-Sep-1995 ttext change 

29-Jan-1999 
ACCESSIONS S32793 
REFERENCE S32793 

fauthors Dreher, K.L.; Asundi, V.; Matzura, D.; Cowan, K. 

fjournal Eur, J, Cell Biol. (1990) 53:296-304 
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ititle Vascular smooth muscle biglycan represents a highly conserved 

proteoglycan within the arterial wall, 
♦cross -references MUID: 91184222 
♦accession S32793 

♦♦status preliminary 
♦tmolecule.type mRNA 
ftresidues 1-369 ttlabel DRE 
ti cross -references GB:U17834; NID:g600497; PID:g600498 
CLASSIFICATION tsuperfamily decorin; leucine-rich alpha -2 -glycoprotein 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl -terminal homology 
KEYWORDS chondroitin sulfate proteoglycan; dermatan sulfate; 

extracellular matrix; glycoprotein 



FEATURE 
1-16 
17-37 
38-369 
58-82 
92-115 

_ 116-139 



fdomain signal sequence fstatus predicted tlabel SIG\ 
♦domain propeptide tstatus predicted tlabel PRO\ 
♦product biglycan ♦status predicted tlabel MAT\ 
♦domain proteoglycan amino-terminal homology tlabel PAH\ 
♦domain leucine-rich alpha - 2 -glycoprotein repeat 

homology #label LRRl\ 
♦domain leucine-rich alpha -2 -glycoprotein repeat 

•homology tlabel LRR2\ 
140-160 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR3\ 
161-184 tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LRR4\ 
• 185-208 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology llabel LRR5\ 
210-230 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR6\ 
231-254 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR7\ 
255-278 ♦domain leucine-rich alpha - 2 - glycoprotein repeat 

homology tlabel LRR8\ 
279-301 ♦domain leucine-rich alpha - 2 - glycoprotein repeat 

homology ♦label LRR9\ 
302-316 ♦domain leucine-rich alpha-2-glycoprotein repeat 

homology ♦status atypical tlabel LR10\ 
317-369 tdomain proteoglycan carboxyl -terminal homology tlabel 

PCH\ 

42,48,181,199 tbinding_site dermatan sulfate (Ser) (covalent) tstatus 
predicted\ 

271,312 ♦binding.site carbohydrate (Asn) (covalent) ♦status 

predicted 

SUMMARY tlength 369 ♦molecular -weight 41706 tchecksui 3056 

Query Match 18.0%; Score 197; DB 2; Length 369; 

Best Local Similarity 34.8%; Pred. No, 1.69e-15; 

Matches 31; Conservative 23; Mismatches 34; Indels 1; Gaps 1; 

B 60 FSAMCPFGCHCHLRWQCSDLGLKTVPKEISPDTTLLDLQNNDISELR-KDDFKGLQHLY 118 
^ : ::ll I I |:|: l|:|:| |: :| I I |:|| : : :: |::| 

Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 119 ALVLVNNKISKIHEKAFSPLRKLQKLYIS 147 

I I II I: |::|:| I I: I :: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 6 

ENTRY S06280 ♦type complete 

TITLE decorin precursor - bovine 

ALTERNATEJAMES dermatan sulfate proteoglycan II; proteoglycan core protein 
II 

ORGANISM ♦formaljiame Bos primigenius taurus ♦commonjiame cattle 
DATE 31-Mar-1990 tsequence.revision 31-Mar-1990 Itext change 

14-N0V-1997 

ACCESSIONS S06280; B31430; A26545; A20935 



; Young, M.F 



♦authors Day, A. A.; McQuillan, C.I.; Termine, J.D 
♦journal Biochem. J. (1987) 248:801-805 
♦title Molecular cloning and sequence analysis of the cDNA for small 
proteoglycan II of bovine bone. 



♦authors 



♦journal 
♦title 



; PID:g619 



i S.; Tang, L.H.; Rosenberg, 



♦authors 
♦journal 
♦title 



♦cross-references MUID:88133946 
taccession S06280 
♦tmolecule.type mRNA 
♦♦residues 1-360 ttlabel day 
♦♦cross-references EMBL:Y00712; NID:gf 
♦♦experimental source bone 

A31430 

Choi, H.C.; Johnson, T.L. ; 

L. ; Neame, P.J. 
J. Biol. Chem. (1989) 264:2876-2884 
Characterization of the dermatan sulfate proteoglycans, 
DS-PGI and DS-PGII, from bovine articular cartilage and 
skin isolated by octyl-sepharose chromatography, 
♦cross-references MUID: 89123388 
♦accession B31430 
♦tmolecule.type protein 
♦♦residues 31-33, 'X', 35-54 ♦♦label CHO 
♦ taper imental_source cartilage; fetal skin 
A26545 

Coster, L.; Rosenberg, L.C.; van der Rest, M.; Poole, A.R. 
J. Biol. Chem. (1987) 262:3809-3812 
The dermatan sulfate proteoglycans of bovine sclera and their 
relationship to those of articular cartilage. An 
immunological and biochemical study, 
♦cross-references MUID: 87137687 
taccession A26545 
ttmolecule_type protein 
♦♦residues 31-50 ttlabel COS 
t taper imental source sclera 
A20935 

♦authors Pearson, C.H.; Winterbottom, N, ; Fackre, D.S.; Scott, P.G.; 

Carpenter, M.R. 
♦journal J. Biol, Chem, (1983) 258:15101-15104 
♦cross-references MUID: 84087911 
♦accession A20935 
ttmolecule_type protein 
♦♦residues 31-54 ttlabel PEA 
♦♦experimental.source skin 
A44700 

Chopra, R.K.; Pearson, C.H.; Pringle, G.A.; Fackre, D.S.; 

Scott, P.G. 
Biochem. J, (1985) 232:277-279 
Dermatan sulphate is located on serine-4 of bovine skin 
proteodermatan sulphate. Demonstration that most molecules 
possess only one glycosaminoglycan chain and comparison of 
amino acid sequences around glycosylation sites in 
different proteoglycans, 
annotation; glycosylation 

♦superfamily decorin; leucine-rich alpha-2-glycoprotein 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl -terminal homology 
chondroitin sulfate proteoglycan; collagen binding; dermatan 
sulfate; extracellular matrix; glycoprotein 



♦authors 



♦journal 
♦title 



♦contents 
CLASSIFICATION 



KEYWORDS 



FEATURE 
1-15 
16-30 
31-360 
■49-73 
83-106 

107-130 



tdomain signal sequence tstatus predicted tlabel SIG\ 
tdomain propeptide tstatus predicted tlabel PRO\ 
♦product decorin ♦status predicted tlabel MAT\ 
♦domain proteoglycan amino-terminal homology tlabel PAH\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR1\ 
♦domain leucine-rich alpha - 2 - g 1 y coprote in repeat 

homology tlabel LRR2\ 
♦domain leucine-rich alpha - 2 - g 1 y coprote in repeat 

homology tlabel LRR3\ 
♦domain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
tdomain leucine-rich alpha - 2 - g lycoprotein repeat 

homology tlabel LRR5\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR7\ 
tdomain leucine-rich alpha- 2 -g lycoprotein repeat • '• 
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homology tlabel LRR8\ 
271-293 idomain leucine-rich alpha-2-glycoprotein repeat 

homology It label LRR9\ 
294-308 tdomain leucine-rich alpha - 2 - gl ycoprotein repeat 

homology tstatus atypical tlabel LR10\ 
309-360 idomain proteoglycan carboxyl -terminal homology tlabel 

PCH\ 

34 ♦binding.site dermatan sulfate (Ser) (covalent) tstatus 

experimental 

190,326 tbinding.site dermatan sulfate (Ser) (covalent) tstatus 

predicted\ 

212,263,304 #binding_site carbohydrate (Asn) (covalent) tstatus 
predicted 

SUMMARY flength 360 tmolecular-weight 39837 tchecksum 9778 

Query Match 17.9*; Score 195; DB'2; Length 360; 

•Best Local Similarity 34.4%; Pred. No. 3.59e-15; 
hatches 31; Conservative 21; Mismatches 35; Indels 3; Gaps 3; 

Db 51 MGPVCPFRCQCHLRWQCSDLGLEKVPKDLPP-DIALLDLQNNKITEIK-DGDFKNLKNL 108 

: ::M :|:| l:|: II: :l :| I II I |:|: : :::: I II 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLL-LSGNNISTVDLNSNIHVLENL 124 

Db 109 HTLILINNKISKISPGAFAPLVKLERLYLS 138 

I I II I: I: :| I II I I: 
Qy 125 EXLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 7 

ENTRY 147020 l type complete 

TITLE decorin - rabbit 

ORGANISM tforaal_name Oryctolagus cuniculus tcommon name domestic 

rabbit 

DATE 04-Sep-1997 tsequence revision 04-Sep-1997 ttext_change 

16-Dec-1998 
147020 
147020 

♦authors zhan, 0,; Burrows, R. ; Cintron, C. 

♦journal Invest. Ophthalmol. Vis. Sci. (1995) 36:206-215 

ttitle Cloning and in situ hybridization of rabbit decorin in 

corneal tissues, 
tcross -references MOID: 95122319 
taccession 147020 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
♦tmoleculejype mRNA 
♦♦residues 1-360 ttlabel ZHA 
♦♦cross-references GB:S76584; NID:g913374; PID:g913375 
iSSIFICATION ♦superfamily decorin; leucine-rich a lpha - 2 - glycoprotein 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl- terminal homology 



FEATURE 




49-73 


♦domain proteoglycan amino-terminal homology tlabel PM 


83-106 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR1\ 


107-130 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR2\ 


131-151 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR3\ 


152-175 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR4\ 


176-199 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


202-222 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR6\ 


223-246 


♦domain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR7\ ■ 


247-270 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR8\ 


271-293 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR9\ 


294-308 


♦domain leucine-rich alpha-2 -glycoprotein repeat 




homology tstatus atypical ♦label LR10\ 


309-360 


tdomain proteoglycan carboxyl-terminal homology tlabel 



PCH 

SUMMARY tlength 360 tmolecular -weight 39896 tchecksum 8087 

Query Match 17.9%; Score 195; DB 2; Length 360 ; 

Best Local Similarity 33.7%; Pred. No. 3.59e-15; 

Matches 30; Conservative 22; Mismatches 36; Indels 1; Gaps 1; 

Db 51 LGPVCPFRCQCHLRWQCSDLGLDKVPKDLPPDTTLLDLQNNKITEIK-DGDFKNLKNLH 109 

: ::H :|:| |:|: lh :| :| :| I I ,hl: : :::: I II 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 110 ALILVNNKISKISPGAFTPLVKLERLYLS 138 

I I II I: I: :| I II I |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 8 

ENTRY NBHUC8 ttype complete 

title decorin precursor - human 

ALTERNATEJAMES cartilage proteoglycan protein II; DS-PG II; PG40 core 

protein; proteoglycan 38K core protein 
ORGANISM tformaljiame Homo sapiens tcommonjiame man 
DATE 30-Jun-1988 tsequence_revision 30-Jun-1988 ♦text change 

31-Dec-1998 

ACCESSIONS A45016; A45015; B45015; A26476; S05640 
REFERENCE A45016 

tauthors Vetter, 0.; Vogel, W.; Just, W.; Young, M.F.; Fisher, L.w. 
♦journal Genomics (1993) 15:161-168 

♦title Human decorin gene: intron-exon junctions and chromosomal 

localization, 
♦cross-references MUID; 93162643 
♦accession A45016 

♦tmolecule.type DNA 

♦♦residues 1-359 ttlabel VET 

ttcross-references GB:L01125; GB:L01126; GB:L01127; GB:L01128; 

GB:L01129; GB:L01130; GB:L01131 
tinote sequence extracted from NCBI backbone (NCBIP: 125061) 

REFERENCE A45015 

♦authors Danielson, K.G.; Fazzio, A.; Cohen, I.; Cannizzaro, L.A.; 

Eichstetter, I.; Iozzo, R.V. 
♦journal Genomics (1993) 15:146-160 

ttitle The human decorin gene: intron-exon organization, discovery 
of two alternatively spliced exons in the 5' untranslated 
region, and mapping of the gene to chromosome 12q23. 
♦cross-references MUID : 9316264 2 
taccession A45015 

♦♦status not compared with conceptual translation 
♦tmolecule.type DNA 
♦♦residues 28-70 ttlabel DA2 
♦♦cross-references GB:M98262 

♦♦note sequence extracted from NCBI backbone (NCBIP:125013) 

♦accession B45015 

♦tstatus not compared with conceptual translation 

♦♦molecule.type DNA 

♦♦residues 296-359 ♦tlabel DAN 

♦♦note sequence extracted from NCBI backbone (NCBIP: 125017) 

REFERENCE A26476 

♦authors Rrusius, T.; Ruoslahti, E. 

tjournal Proc, Natl. Acad. Sci. U.S.A. (1986) 83:7683-7687 

ttitle Primary structure of an extracellular matrix proteoglycan 

core protein deduced from cloned cDNA. 
fcross-references MUID:87017013 
taccession A26476 
♦♦molecule.type mRNA 
♦♦residues 1-359 ttlabel kru 
♦♦cross-references GB:M14219; NID:gl81169; PID:gl81170 
REFERENCE S05639 

tauthors Roughley, P.J.; White, R.J. 
tjournal Biochem. J. (1989) 262:823-827 

ttitle Dermatan sulphate proteoglycans of human articular cartilage, 
The properties of dermatan sulphate proteoglycans I and II. 
tcross -references MOID : 90073579 
taccession S05640 
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ttmolecule_type protein 
f»residues 31-33, 'X\ 35-50 ttlabel ROU 
COMMENT This protein binds type I collagen. 

GENETICS 

igene GDB:DCN 

tfcross-references GDB:119839; OMIM:125255 
#map_position 12q21.3-12q23 
tintrons 71/1; 108/3; 180/1; 218/1; 249/2; 295/3 
tnote the first two introns occur before the initiator codon 

CLASSIFICATION fsuperfamily decorin; leucine-rich alpha-2 -glycoprotein 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl- terminal homology 
KEYWORDS chondroitin sulfate proteoglycan; collagen binding; dermatan 
sulfate; duplication; extracellular matrix; fibroblast; 
glycoprotein; tandem repeat 



FEATURE 
1-16 
17-30 
31-359 
48-72 

A 82-105 

M06-129 



270-292 



211,262,303 
SUMMARY 



tdomain signal sequence tstatus predicted flabel SIG\ 
idomain propeptide istatus predicted tlabel PRO\ 
tproduct decorin tstatus predicted tlabel MPT\ 
((domain proteoglycan amino -terminal homology tlabel PAH\ 



homology tlabel LRRl\ 
idomain leucine-rich alph 

homology tlabel LRR2\ 
(domain leucine-rich alph 

homology tlabel LRR3\ 
(domain leucine-rich alph 

homology tlabel LRR4\ 
(domain leucine-rich alph 

homology tlabel LRR5\ 
(domain leucine-rich alph 

homology tlabel LRR6\ 
(domain leucine-rich alph 

homology tlabel LRR7\ 
(domain leucine-rich alph 

homology tlabel LRR8\ 
(domain leucine-rich alph 

homology tlabel LRR9\ 



homology tlabel LR10\ 
fbinding.site dermatan sulfate (Ser) (covalent) tstatus 
experimental 

tbinding_site dermatan sulfate (Ser) (covalent) tstatus 
predicted\ 

tbinding_site carbohydrate (Asn) (covalent) tstatus 
predicted 

tlength 359 tmolecular -weight 39746 tchecksum 835 



glycoprotein 


repeat 


glycoprotein 


repeat 


glycoprotein 


repeat 


glycoprotein 


repeat 


glycoprotein 


repeat 


glycoprotein 


repeat 


glycoprotein 


repeat 


glycoprotein 


repeat 


glycoprotein 


repeat 


glycoprotein 


repeat 



Qui 



Query Match 17.7%; Score 193; DB 1; Length 359; 

"lest Local Similarity 32.6%; Pred, No. 7.61e-15; 
itches 29; Conservative 23; Mismatches 36; Indels 



1; Gaps 1; 



Db 50 LGPVCPFRCQCHLRWQCSDLGLDKVPKDLPPDTTLLDLQNNKITEIK-DGDFKNLKNLH 108 

: ::|| :|:| |:|: ||: :| :| :| I I |:|: : :::: | || 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 109 ALILVNNKISKVSPGAFTPLVKLERLYLS 137 

I I II I: :: :| I II I I: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 9 

ENTRY BGHUN I type complete 

TITLE biglycan precursor - human 

ALTERNATEJAMES cartilage proteoglycan protein I; dermatan sulfate 

proteoglycan I (DS-PGI); proteoglycan I core protein (PG-I) 
ORGANISM tformaljiame Homo sapiens tcommonjiame man 

DATE 21-Apr-1992 tsequence_revision 26-May-1995 ttext change 

14-NOV-1997 

ACCESSIONS A40757; A32458; S05639; A28457; S14349 
REFERENCE A40757 

tauthors Fisher, L.W.; Heegaard, A.M.; Vetter, o.; Vogel, W.; Just, 
W,; Termine, J.D.; Young, M.F. 



tjournal J. Biol. Chem. (1991) 266:14371-14377 

ttitle Human biglycan gene, Putative promoter, intron-exon 

junctions, and chromosomal localization, 
tcross-references MUID: 91317791 
taccession A40757 

ttmoleculejype DNA 

ttresidues 1-368 ttlabel FIS 

t tcross-references GB:M65151; GB:M65152; GB:M65153 
'ERENCE A32458 

tauthors Fisher, L.W.; Termine, J.D.; Young, M.F. 

tjournal J. Biol, Chem. (1989) 264:4571-4576 

ttitle Deduced protein sequence of bone small proteoglycan I 

(Biglycan) shows homology with proteoglycan II (Decorin) 
and several nonconnective tissue proteins in a variety of 



tcross-references MUID: 89174714 
taccession A32458 
ttmolecule.type mRNA 

ttresidues 1-138, 'NV' ,141-162, 'DV ,165-368 ttlabel FI2 
ttcross-references GB:J04599 

ttnote parts of this sequence, including the amino end of the 

mature protein, were determined by protein sequencing 

REFERENCE S05639 

tauthors Roughley, P.J.; White, R,J, 
tjournal Biochem. J. (1989) 262:823-827 

ttitle Dermatan sulphate proteoglycans of human articular cartilage. 

The properties of dermatan sulphate proteoglycans I and II, 
tcross-references MUID: 90073579 
taccession S05639 
ttmoleculejype protein 

ttresidues 38-41, 'X', 43-46, 'X', 48-57 ttlabel ROD 
REFERENCE A92656 

tauthors Fisher, L.W.; Hawkins, G.R.; Tuross, N.; Termine, J.D, 

tjournal J. Biol. Chem. (1987) 262:9702-9708 

ttitle Purification and partial characterization of small 

proteoglycans I and II, bone sialoproteins I and II, and 
osteonectin from the mineral compartment of developing 
human bone, 
tcross-references MUID: 87250639 
taccession A28457 
ttmoleculejype protein 

ttresidues 38-41, 'X' ,43-62, 'X', 64-66 ttlabel FI3 
#texperimental_source bone 
REFERENCE S14349 

tauthors Stoecker, G.; Meyer, H.E.; Wagener, C; Greiling, H. 
tjournal Biochem. J. (1991) 274:415-420 
ttitle Purification and N-terminal amino acid sequence of a 
chondroitin sulphate/dermatan sulphate proteoglycan 
isolated from intima/media preparations of human aorta, 
tcross-references MUID: 91174749 
taccession S14349 
ttmoleculejype protein 
ttresidues 38-57 ttlabel STO 
ttexperimental_source aorta 
GENETICS 

tgene GDB : BGN 

ttcross-references GDB: 119727; OMIM: 301870 
tmapjosition Xq28-Xq28 

tintrons 80/1; 117/3; 189/1; 226/1; 257/2; 303/3 
CLASSIFICATION tsuperfamily decorin; leucine-rich alpha-2-glycoprotein 

repeat homology; proteoglycan amino-terminal homology; 

proteoglycan carboxyl -terminal homology 
KEYWORDS chondroitin sulfate proteoglycan; dermatan sulfate; 

duplication; extracellular matrix; glycoprotein; tandem 

repeat 

FEATURE 

1-16 tdomain signal sequence tstatus predicted tlabel SIG\ 

17-37 tdomain propeptide tstatus predicted tlabel PRO\ 

38-368 tproduct biglycan tstatus predicted tlabel MAT\ 

57-81 tdomain proteoglycan amino-terminal homology tlabel PAH\ 

91-114 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRRl\ 
115-138 tdomain leucine-rich alpha-2-glycoprotein repeat .. 
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139 


159 


160 


183 


184 


207 


209 


229 


230 


253 


254 


277 


278 


300 


301 


315 


316 


368 


42,47 



270,311 
SUMMARY 



homology tlabel LRR2\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology ilabel LRR3\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology ilabel LRR4\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology Ilabel LRR5\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR7\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR8\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR9\ 
tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tstatus atypical tlabel LR10\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCH\ 

tbinding_site dermatan sulfate (Ser) (covalent) tstatus 
experimental 

tbinding_site dermatan sulfate (Ser) (covalent) tstatus 
predicted\ 

tbinding.site carbohydrate (Asn) (covalent) tstatus 
predicted 

tlength 368 tmolecular -weight 41654 tchecksum 1684 



Query Match 17.7%; Score 193; DB 1; Length 368; 

Best Local Similarity 34.5%; Pred. No. 7.61e-15; 

Matches 30; Conservative 23; Mismatches 33; Indels 1; Gaps 1; 

Db 61 AMCPFGCHCHLRWQCSDLGLKSVPKEISPDTTLLDLQNNDISELR-KDDFKGLQHLYAL 119 

"II I I hh ll:::| |: :| I I |:|| : : :: |::| | 
Qy 68 SICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXL 127 

Db 120 VLVNNKISKIHEKAFSPLRKLQKLYIS 146 

I II I: !::!:! I II: I 
Qy 128 DLSNNH ITF I NDKSFEKLSKLRELXLN 154 



RESOLT 10 

ENTRY S32559 ttype complete 

TITLE biglycan precursor - bovine 

ALTERNATE.NAMES dermatan sulfate proteoglycan I (DS-PGI); proteochondroitin 

•core protein; proteoglycan I core protein (PG-I) 
ANISM tformal_name Bos primigenius taurus tcommon.name cattle 

E 03-May-1994 tsequence_revision 20-Feb-1995 ttext_change 

14-Nov-1997 

ACCESSIONS S32559; S34229; A33701; A31430; PT0078; S55673; A33137 
REFERENCE S32559 

♦authors Torok, M.A,; Evans, S.A.S.; Marcum, J, A. 
tjournal Biochim, Biophys. Acta (1993) 1173:81-84 
ttitle cDNA sequence for bovine biglycan (PGI) protein core, 
taccession S32559 
ttmolecule.type mRNA 
ttresidues 1-369 t Ilabel TOR 
ttcross -references EMBL;L07953; NID:gl62746 
ttexperimental_source aortic smooth muscle 
REFERENCE S34229 

tauthors Marcum, J. A.; Torok, M.; Evans, S. 
tsubmission submitted to the EMBL Data Library, December 1992 
taccession S34229 
ttmolecule.type mRNA 

ttresidues 1-250, 'V, 252-369 ttlabel MAR 
ttcross -references EMBL:L07953 
REFERENCE A33701 

tauthors Neame, P.J.; Choi, H.U.; Rosenberg, L.C. 
tjournal J, Biol. Chem. (1989) 264:8653-8661 
ttitle The primary structure of the core protein of the small, 
leucine-rich proteoglycan (PG I) from bovine articular 
cartilage, 
tcross -references MUID:89255324 
.taccession A33701 



ttmolecule.type protein 

ttresidues 38-187, 'E\ 189-367, 'Y' ttlabel NEA 
ttexperimental_source cartilage 
REFERENCE A31430 

tauthors Choi, H.U.; Johnson, T.L.; Pal, S.; Tang, L.H.; I 

L.; Neame, P.J, 
tjournal J. Biol. Chem. (1989) 264:2876-2884 
ttitle Characterization of the dermatan sulfate proteoglycans, 
DS-PGI and DS-PGII, from bovine articular cartilage and 
skin isolated by octyl-sepharose chromatography, 
tcross-references MOID; 89123388 
taccession A31430 
ttmolecule.type protein 

ttresidues 38-41, 'X', 43-47, 'X', 49-63 ttlabel CHO 

ttnote sequences from skin and cartilage were identical 

REFERENCE PT0077 

tauthors Marcum, J. A.; Thompson, M.A. 

tjournal Biochem. Biophys. Res. Coiranun. (1991) 175:706-712 

ttitle The amino-terminal region of a proteochondroitin core 

protein, secreted by aortic smooth muscle cells, shares 
sequence homology with the pre-propeptide region of the 
biglycan core protein from human bone, 
tcross-references MOID: 91207372 
taccession PT0078 
ttmolecule.type protein 
ttresidues 17-24, 'F', 26-30 ttlabel MA2 
ttexperimental_source aortic smooth muscle 
REFERENCE S55673 

tauthors Scott, P.G.; Nakano, T.; Dodd, CM. 
tjournal Biochim. Biophys, Acta (1995) 1244:121-128 
ttitle Small proteoglycans from different regions of the 
fibrocartilaginous temporomandibular joint disc, 
tcross-references MOID: 95284073 
taccession S55673 
ttmolecule.type protein 

ttresidues 38-41, 'X', 43-47, 'X', 49-53 ttlabel SCO 
CLASSIFICATION tsuperfamily decorin; leucine-rich alpha - 2 -glycoprotein 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl-terminal homology 
KEYWORDS cartilage; chondroitin sulfate proteoglycan; dermatan 
sulfate; extracellular matrix; glycoprotein 



FEATURE 
1-16 
17-37 

38-369 
58-82 
92-115 

' 116-139 



210-230 



tdomain signal sequence tstatus predicted tlabel SIG\ 
tdomain amino-terminal propeptide tstatus predicted 
tlabel PR0\ 

tproduct biglycan tstatus predicted tlabel MAT\ 
tdomain proteoglycan amino-terminal 'homology tlabel PAH\ 
tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LRR1\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR2\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR5\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR7\ 
tdomain leucine-rich alpha-2 - glycoprotein repeat 

homology tlabel LRR8\ 
tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology tlabel LRR9\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tstatus atypical tlabel LR10\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 

PCH\ 

tbinding.site dermatan sulfate (Ser) (covalent) tstatus 
experimental 

tbinding.site dermatan sulfate (Ser) (covalent) tstatus 
predicted\ 
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271,312 tbinding_site carbohydrate (Asn) (covalent) f status 

predicted 

SUMMARY tlength 369 tmolecular -weight 41590 ichecksum 1525 

Query Hatch 17.7%; Score 193; DB 2; Length 369; 

Best Local Similarity 34.5%; Pred. No. 7.61e-15; 

Matches 30; Conservative 23; Mismatches 33; mdels 1; Gaps 1; 

Db 62 AMCPFGCHCHLRWQCSDLGLKAVPKEISPDTTLLDLQNNDISELR-KDDFKGLQHLYAL 120 

::H I I hi: ll:::| I: :| I I |:|| : : :: |::| I 
Oy 68 SICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISIVDLNSNIHVLENLEXL 127 

Db 121 VLVNNKISKIHEKAFSPLRKLQKLYIS 147 

I II I: |::|:| I ||: I :: 
Qy 128 DLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 11 

ENTRY 

TITLE 

•ERNATE.NAMES 
ANISM 

ACCESSIONS 
REFERENCE 

tauthors 

ttjournal 

ititle 



tcross-i 
iaccession 

itmolecule. 

ii residues 

##cross- 
iaccession 

Itmolecule. 

it residues 
CLASSIFICATION 



KEYWORDS 
FEATURE 

1-11 

17 

31 

46-1 



Al04< 
^128- 

149 

173' 
199' 
220' 
244' 
268' 
291' 
306' 
SUMMARY 



S24317 ttype complete 

decorin precursor • chicken 

corneal chondroitin/dermatan sulfate proteoglycan 

ftformal_name Gallus gallus ftcomnton_name chicken 

13-Jan-1995 isequence_revision 13-Jan-1995 ftext change 

14-Nov-1997 
S24317; S58474; S22197 
S24317 

Li, w.; Vergnes, J. P.; Cornuet, P.K.; Hassell, J.R. 
Arch. Biochem. Biophys. (1992) 296:190-197 
cDNA clone to chick corneal chondroitin/dermatan sulfate 
proteoglycan reveals identity to decorin. 
MUID: 92296755 
S24317 
.type mRNA 

1-357 iflabel LIW 

ierences EMBL;X63797; NID:g62887; PID:g62888 
S58474 

.type protein 

31-33, 'X' , 35-39, 'X' ,41-48, 'X' , 50-51 iflabel LIA 
fsuperfamily decorin; leucine-rich alpha -2 -glycoprotein 

repeat homology; proteoglycan amino-terminal homology; 

proteoglycan carboxyl -terminal homology 
collagen binding; extracellular matrix; glycoprotein 



idomain signal sequence istatus predicted tlabel SIG\ 
♦domain propeptide Istatus predicted i label PR0\ 
tproduct decorin istatus experimental ilabel mat\ 
idomain proteoglycan amino-terminal homology ilabel PAH\ 
idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRRl\ 
idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR2\ 
idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR3\ 
idomain leucine-rich alpha -2 -glycoprotein repeat 

homology ilabel LRR4\ 
idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR5\ 
idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR6\ 
idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR7\ 
idomain leucine-rich alpha -2 -glycoprotein repeat 

homology ilabel LRR8\ 
idomain leucine-rich alpha -2 -glycoprotein repeat 

homology ilabel LRR9\ 
idomain leucine-rich alpha -2 -glycoprotein repeat 

homology istatus atypical ilabel LR10\ 
idomain proteoglycan carboxyl -terminal homology tlabel 

PCH 

ilength 357 imolecular -weight 39687 ichecksum 4375 



Query Match 17.6%; Score 192; DB 2; Length 357; 

Best Local Similarity 33.7%; Pred. No. l.lle-14; 



Matches 30; Conservative 22; Mismatches 36; Indels 1; Gaps 1; 

Db 48 FGPVCPFRCQCHLRWQCSDLGLERVPKDLPPDTTLLDLQNNKITEIK'EGDFKNLKNLH 106 

: ::ll :|:| l:|: II: :| :| :| I I |:|: : :::: I II 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 107 ALILVNNK ISK ISPAAFAPLKKLERL YLS 135 

II II I: I: :| I II I |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 
ENTRY 
TITLE 



S72271 itype complete 
proteoglycan Lb precursor - mouse 



ORGANISM 
DATE 

ACCESSIONS 
REFERENCE 
iauthors 

ijournal 
ititle 



ALTERNATEJAMES chondroitin/dermatan sulphate proteoglycan 

iformal_name Mus ntusculus tcommonjiame house mouse 
14-Apr-1998 tsequencejrevision 24-Apr-1998 itext change 

13-Sep-1998 
S72271 
S72271 

Rurita, K. ; Shinomura, T.; Ujita, M. ; Zako, M, ; Kida, D,; 

Iwata, H.; Kimata, K. 
Biochem. J. (1996) 318:909-914 
Occurrence of PG-Lb, a leucine-rich small 
chondroitin/dermatan sulphate proteoglycan in mammalian 
epiphyseal cartilage: molecular cloning and sequence 
analysis of the mouse cDNA. 
S72271 
ttmolecule_type mRNA 
ttresidues 1-322 itlabel KUR 

iicross-references EMBL:D78274; NID: gl620004 ; PID:dl011999; PID:gl620005 
iiexperimental_source strain BALB/c; newborne; epiphyseal cartilage 
FUNCTION 

fdescription probably participates in ossification process 
CLASSIFICATION isuperfamily osteoinductive factor precursor; leucine-rich 
alpha -2 -glycoprotein repeat homology 
glycoprotein 



KEYWORDS 

FEATURE 
1-25 
26-322 
145-167 

168-191 



283,302 



SUMMARY 



idomain signal sequence istatus predicted ilabel SIG\ 
tproduct proteoglycan Lb istatus predicted ilabel MAT\ 
idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR1\ 
idomain leucine-rich alpha - 2 - glycoprotein repeat 

homology ilabel LRR2\ 
idomain leucine-rich alpha - 2 - glycoprotein repeat 

homology ilabel LRR3\ 
idomain leucine-rich alpha - 2 - glycoprotein repeat 

homology ilabel LRR4\ 
idomain leucine-rich alpha- 2 -glycoprotein repeat ' 

homology ilabel IRR5\ 
idomain leucine-rich alpha-2 -glycoprotein repeat 

homology ilabel LRR6\ 
tbinding_site carbohydrate (Asn) (covalent) istatus 



tlength 322 tmolecular-weight 36762 ichecksum 9522 



Query Match 16.8%; Score 184; DB 2; Length 322; 

Best Local Similarity 34,1%; Pred. no, 2.18e-13; 

Matches 28; Conservative 27; Mismatches 25; Indels 2; Gaps 2; 

Db 121 CTCISTTVYCDDHELDAIP-PLPKKTTYFYSRFNRIKKIN-KNDFASLNDLKRIDLTSNL 178 

I I :lll I: : l::ll :: II:: :::: |::| :||::| 

Qy 74 CDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXLDLSNNH 133 

Db' 179 ISEIDEDAFRKLPHLQELVLRD 200 

I: I:: :| 11 = hll I I 
Qy 134 ITFINDKSFEKLSKLRELXLND 155 



RESULT 13 

ENTRY A55454. ttype complete 

TITLE decorin precursor - mouse 

ALTERNATEJAMES proteoglycan II 
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ORGANISM #formal_name Mus musculus Scommonjame house mouse 
DATE 24-Feb-1995 fsequencejrevision 24-Feb-1995 ttexLchange 

14-Nov-1997 
ACCESSIONS A55454; S20812 
REFERENCE A55454 

tauthors Scholzen, T.; Solursh, M.; Suzuki, s.; Reiter, R,; Morgan, 

J.L.; Buchberg, A.M.; Siracusa, L.D.; Iozzo, R.V. 
tjournal J. Biol. Chem. (1994) 269:28270-28281 
ttitle The murine decorin. Complete cDNA cloning, genomic 

organization, chromosomal assignment, and expression during 
organogenesis and tissue differentiation, 
fcross-references MUID:95050610 
iaccession A5S454 

ttstatus preliminary 

ttmolecule_type mRNA 

ttresidues 1-354 ttlabel SCH 

•# tcross -references GB:X53929; NID:g53668; PID:g53669 
ERENCE S20811 
tauthors Naitoh, Y.; Suzuki, S. 
tsubmission submitted to' the EMBL Data Library, July 1990 
tdescription Nucleotide sequences of cDNAs encoding mouse PGI and PGII. 
Iaccession S20812 

ttstatus preliminary 
ftmolecule_type mRNA 
ttresidues 1-354 ttlabel NAI 
ttcross-references EMBL:X53929; NID:g53668; PID:g53669 
CLASSIFICATION tsuperfamily decorin; leucine-rich alpha-2 -glycoprotein 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl -terminal homology 
KEYWORDS collagen binding; extracellular matrix; glycoprotein 
FEATURE 

1-16 tdomain signal sequence fstatus predicted tlabel SIG\ 

17-30 tdomain propeptide tstatus predicted tlabel PRO\ 

43-67 tdomain proteoglycan amino-terminal homology tlabel pah\ 

77-100 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR1\ 
101-124 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR2\ 
125-145 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
146-169 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
170-193 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR5\ 

•196-216 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR6\ 
217-240 tdomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LRR7\ 
241-264 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR8\ 
265-287 tdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR9\ 
288-302 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tstatus atypical tlabel LR10\ 
303-354 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCH 

SUMMARY" tlength 354 tmolecular-weight 39809 tchecksum 3271 

Query Match 16.8%; Score 184; DB 2; Length 354; 

Best Local Similarity 34.4*; Pred. No, 2,18e-13; 

Matches 33; Conservative 23; Mismatches 37; Indels 3; Gaps 3; 

Db 39 YDPDNPLI-SMCPYRCQCHLRWQCSDLGLDKVPWDFPPDTTLLDLQNNKITEIK-EGAF 96 

I :h :| Ml :|:| IM ||: :| :| :| I I |:|: : :: : 
Qy 60 Y - ADSCF I DS ICPTQCDC YGT TVDCNKRGLNT I PTS I PRFATQLLLSGNNI STVDLNSNI 118 

Db 97 KNLKDLHTLILVNNKISKISPEAFKPLVKLERLYLS 132 

IM I Ml I: I: :| I II I |: 
Qy 119 HVLENLEXLDLSNNH ITF INDKSFEKLSKLRELXLN 154 



RESULT 14 
ENTRY 



S29145 ' ttype complete 



TITLE decorin precursor - rat 

ALTERNATEJAMES dermatan sulfate proteoglycan-II 

ORGANISM tformaljiame Rattus norvegicus tcommon_name' Norway rat 

DATE 13-Jan-1995 tsequence_revision 13-Jan-1995 ttextjdiange 

17-Mar-1999 
ACCESSIONS S29145; 160238; S28517 
REFERENCE S29145 

tauthors Abramson, S.R.; Woessner Jr., J.F, 

tjournal Biochim. Biophys. Acta (1992) 1132:225-227 

ttitle cDNA sequence for rat dermatan sulfate proteoglycan-II 

(decorin) . 
tcross -references MUID: 93003331 
Iaccession S29145 
ttstatus preliminary 
ltmolecule_type mRNA 
Itresidues 1-354 ttlabel ABR 
I tcross -references EMBL:Z12298; NID:g57549; PID:g57550 
160238 

tauthors Asundi, V.K.; Dreher, K.L, 
tjournal Eur. J. Cell Biol. (1992) 59:314-321 
ttitle Molecular characterization of vascular smooth muscle decorin: 
deduced core protein structure and regulation of gene 



tgene 

CLASSIFICATION 



tcross -references MUID: 93154359 
Iaccession 160238 

ttstatus preliminary; translated from GB/EMBL/DDBJ 

ttmolecule.type mRNA 

ttresidues 11-354 ttlabel RES 

ttcross-references EMBL:X59859; NID:g56056; PID:g56057 
GENETICS 

DCN 

tsuperfamily decorin; leucine-rich alpha-2 -glycoprotein 
repeat homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl-terminal homology 
KEYWORDS collagen binding; extracellular matrix; glycoprotein 

FEATURE 



1-30 


tdomain signal sequence (fragment) tstatus predicted 




tlabel SIG\ 




31-354 


tproduct decorin tstatus predicted tlabel MAT\ 


43-67 


tdomain proteoglycan amino- 


terminal homology tlabel PAH\ 


77-100 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRRl\ 




101-124 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRR2\ 


125-145 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRR3\ 




146-169 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRR4\ 




170-193 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRR5\ 




196-216 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRR6\ 


217-240 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRR7\ 




241-264 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRR8\ 


265-287 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tlabel LRR9\ 


288-302 


tdomain leucine-rich alpha- 


2 -glycoprotein repeat 




homology tstatus atypical tlabel LR10\ 


303-354 


tdomain proteoglycan carboxyl-terminal homology tlabel 



SUMMARY tlength 354 tmolecular-weight 39805 tchecksum 4526 

Query Match 16.84; Score 183; DB 2; Length 354; 

Best Local Similarity 34.4%; Pred. No. 3. 16e- 13; 

Matches 33; Conservative 23; Mismatches 37; Indels 3; Gaps 3; 

Db 39 YDPDNPLI-SMCPYRCQCHLRWQCSDLGLDKVPWEFPPDTTLLDLQNNKITEIK-EGAF 96 

I :h :| 1 = 11 Ml IM II: :| :| :| I I |:| 

Qy 60 Y-ADSCFIDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNI 118 

Db 97 KNLKDLHTLILVNNKISKISPEAFKPLVKLERLYLS 132 
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I : | | || |: |: :| | || | |; 
Qy 119 HVLENLEXLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 15 

ENTRY 

TITLE 

ORGANISM 

DATE 



ACCESSIONS 
REFERENCE 
tauthors 
tjournal 
ititle 



A41781 Ittype complete 

proteoglycan-Lb - chicken 

tformaljiame Gallus gallus tcommonjiame chicken 

04-Mar-1993 tsequencejrevision 18-Nov-1994 Itext change 

04-Sep-1998 
A41781 
A41781 

Shinomura, T,; 
J, Biol. Chem. 



Kimata, K, 

(1992) 267:1265-1270 
Proteoglycan-Lb, a small dermatan sulfate proteoglycan 
expressed in embryonic chick epiphyseal cartilage, is 
structurally related to osteoinductive factor, 
tcross-references MUID:92112771 
taccession A41781 

It i status preliminary 

•Itmolecule.type mRNA 
tfresidues 1-316 ttlabel SHI 
ft cross -references GB:D10485; GB:D90461; NID:g222846; PID:dl001838; 

PID:g222847 
*texperimental_source embryo 

Itnote sequence extracted from NCBI backbone (NCBIN: 76536, 

NCBIP: 76537) 

CLASSIFICATION tsuperfamily osteoinductive factor precursor; leucine-rich 
alpha-2-glycoprotein repeat homology 



FEATURE 
139-161 



SUMMARY 



Idomain leucine-rich alpha -2 -glycoprotein repeal 

homology flabel LRR1\ 
Idomain leucine-rich alpha-2-glycoprotein repeal 

homology tlabel LRR2\ 
Idomain leucine-rich alpha - 2 - glycoprotein repeal 

homology tlabel LRR3\ 
Idomain leucine-rich alpha-2-glycoprotein repeal 

homology tlabel LRR4\ 
Idomain leucine-rich alpha-2-glycoprotein repeal 

homology tlabel LRR5\ 
Idomain leucine-rich alpha- 2 - glycoprotein repeal 

homology tlabel LRR6 
tlength 316 tmolecular -weight 35856 tchecksum 111 



Query Match 16,7%; Score 182; DB 2; Length 316; 

Best Local Similarity 36.1*; Pred, No. 4.57e-13; 

Matches 30; Conservative 23; Mismatches 26; Indels 4; Gaps 4; 

115 CTCLGTTVYCDDRELDAVP-.PLPK-NTMYFYSRYNRIRKIN-KNDFANLNNLKRIDLTAN 171 

I I Mil I: I |:::| ::|: I : I I I :: :::: |:|| :||: I 
74 CDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSG-NNISTVDLNSNIHVLENLEXLDLSNN 132 



i 



172 LISEIHEDAFRRLPQLLELVLRD 194 

I: I:: :| :|: I II I I 
133 HITFINDKSFEKLSKLRELXLND 155 



Search completed: Fri May 28 09:04:25 1999 
Job time : 28 sees. 
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Release 3.1A John F. Collins, Biocoraputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^srch_pp protein ■ protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:04:42 1999; MasPar time 7.04 Seconds 

622.440 Million cell updates/sec 

Tabular output not generated. 

Title: MJS-09-191-647-8 

Description: (1-155) from DS09191647 .pep 

Perfect Score: 1092 

Sequence: 1 RNPXICDCNLQWLAQINLQK FINDKSFEKLSKLRELXLND 155 

Scoring table: PAM 150 
Gap 11 

Searched: 77977 seqs, 28268293 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 



Database: 

Statistics: Mean 44.155; Variance 77.249; scale 0.572 



;wiss-prot37 
liswissprot 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



NO. 


Score 


Match Length D 


3 ID 


Description 


Pred. t 


Jo. 


CC 
CC 
CC 


1 


483 


44.2 


1480 


SLIT_DROME 


SLIT PROTEIN PRECURSOR 


U9e 


80 


2 


197 


18.0 


369 


PGSlJftT 


BONE/CARTILAGE PROTEOG 


4.95e 


19 


CC 


3 


197 


18.0 


369 


PGS1J10USE 


BONE/CARTILAGE PROTEOG 


4.95e 


19 


CC 


4 


195 


17.9 


360 


PGS2J30VIN 


BONE PROTEOGLYCAN II P 


l,22e 


18 


CC 


5 


193 


17.7 


359 


PGS2JUMAN 


BONE PROTEOGLYCAN II P 


2.98e 


18 


CC 


6 


193 


17.7 


360 


PGS2.CANFA 


BONE PROTEOGLYCAN II P 


2.98e 


18 


CC 


7 


193 


17.7 


368 


PGS1JUMAN 


BONE/CARTILAGE PROTEOG 


2.98e 


18 


CC 


8 


193 


17.7 


369 


PGS1J0VIN 


BONE/CARTILAGE PROTEOG 


2.98e 


18 


CC 


9 


192 


17.6 


357 


PGS2_CHICK 


BONE PROTEOGLYCAN II P 


4.66e 


18 


CC 


10 


189 


17.3 


369 


PGS1.CANFA 


BONE/CARTILAGE PROTEOG 


1.78e 


17 


CC 


11 


184 


16.8 


354 


PGS2J0USE 


BONE PROTEOGLYCAN II P 


1.63e 


16 


CC 


12 


183 


16.8 


354 


PGS2JAT 


BONE PROTEOGLYCAN II P 


2.53e 


16 


CC 


13 


162 


14.8 


208 


GPBB.PAPCY 


PLATELET GLYCOPROTEIN 


2,24e 


12 


CC 


14 


159 


14.6 


206 


GPBBJUMAN 


PLATELET GLYCOPROTEIN 


7.95e 


12 


CC 


15 


159 


14.6 


206 


GPBBJOUSE 


PLATELET GLYCOPROTEIN 


7.95e 


12 


CC 


16 


159 


14.6 


343 


LUM_CHICK 


LUMICAN PRECURSOR (LUM 


7.95e 


12 


CC 


17 


147 


13.5 


662 


GARPJUMAN 


GARP PROTEIN PRECURSOR 


1.16e 


09 


CC 


18 


142 


13.0 


440 


OMGPJOUSE 


OLIGODENDROCYTE-MYELIN 


8.89e 


09 


DR 


19 


140 


12.8 


338 


LUM_MOUSE 


LUMICAN PRECURSOR (LUM 


1.99e 


08 


DR 


20 


140 


12.8 


440 


OMGPJUMAN 


OLIGODENDROCYTE-MYELIN 


1.99e 


08 


DR 


21 


139 


12.7 


560 


GPVJUMAN 


PLATELET GLYCOPROTEIN 


2.97e 


08 


DR 


22 


137 


12.5 


567 


GPV_RAT 


PLATELET GLYCOPROTEIN 


6.59e 


08 


DR 


23 


' 136 


12.5 


682 


CONN.DROME 


CONNECT IN PRECURSOR. 


9.81e 


08 


DR 



24 


136 


12.5 


4303 


PKDl HUMAN 


POLYCYSTIN PRECURSOR ( 


9.81e-08 


25 


134 


12.3 


605 


ALSJUMAN 


INSULIN-LIKE GROWTH FA 


2.16e-07 


26 


131 


12.0 


1134 


CHAO_DROME 


CHAOPTIN PRECURSOR (PH 


7 . 01e-07 


27 


130 


11.9 


298 


OIF HUMAN 


OSTEOINDUCTIVE FACTOR 


1.03e-06 


28 


130 


11.9 


299 


OIF BOVIN 


OSTEOINDUCTIVE FACTOR 


1 . 03e- 06 


29 


130 


11.9 


338 


LUM RAT 


LUMICAN PRECURSOR (LUM 


1,035-05 


30 


130 


11.9 


380 


FMOD CHICK 


FIBROMODULIN PRECURSOR 


1.03e-06 


31 


128 


11.7 


342 


LUMJOVIN 


LUMICAN PRECURSOR (LUM 


2.24e-06 


32 


128 


11,7 


382 


PARG HUMAN 


PROLARGIN PRECURSOR (P 


2.24e-06 


33 


128 


11.7 


1115 


gpcrIlymst 


G-PROTEIN COUPLED RECE 


2.24e-06 


34 


127 


11.6 


361 


CHADJQVIN 


CHONDROADHERIN PRECURS 


3.30e-06 


35 


127 


11.6 


603 


ALSjlOUSE 


INSULIN-LIKE GROWTH FA 


3.30e-06 


36 


124 


11.4 


338 


L LUMJUMAN 


LUMICAN PRECURSOR (LUM 


1.04e-05 


37 


124 


11.4 


605 


ALSJAPPA 


INSULIN-LIKE GROWTH FA 


1.04e-05 


38 


123 


11.3 


536 


L CBP8JUMAN 


CARBOXYPEPTIDASE N 83 


1.52e-05 


39 


123 


11.3 


695 


FSHRJ1ACFA 


FOLLICLE STIMULATING H 


1.52e-05 


40 


122 


11.2 


692 


L FSHRJAT 


FOLLICLE STIMULATING H 


2.21e-05 


41 


122 


11.2 


695 


FSHRJUMAN 


FOLLICLE STIMULATING H 


2.21e-05 


42 


120 


11.0 


695 


FSHR.PIG 


FOLLICLE STIMULATING H 


4.69e-05 


43 


119 


10,9 


695 


L FSHRJOVIN 


FOLLICLE STIMULATING H 


6.82e-05 


44 


118 


10,8 


694 


FSHRJORSE 


FOLLICLE STIMULATING H 


9,88e-05 


45 


116 


10,6 


567 


GPVJOUSE 


PLATELET GLYCOPROTEIN 


2.07e-04 



RESULT 
ID 
AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 
OC 
OC 
RN 



1 



SLIT_DROME STANDARD; PRT; 1480 AA. 
P24014; 

01-MAR-1992 (REL. 21, CREATED) 
01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 
01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 
SLIT PROTEIN PRECURSOR. 
SLI. 

DROSOPHILA MELANOGASTER (FRUIT FLY) . 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 91099665. 

ROTHBERG J.M., JACOBS J.R,, GOODMAN C.S., ARTAVANIS-TSAKONAS S.; 

"Slit: an extracellular protein necessary for development of midline 
glia and commissural axon pathways contains both EGF and LRR 
domains."; 

GENES DEV. 4:2169-2187(1990). 

-!- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

MATRIX MOLECULES. 
•I- TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 

EVENTUALLY DISTRIBUTED ALONG THE AXONS. 
•I- ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

BY 11 AA AT THE C'TERMINUS OF THE LAST EGF REPEAT. 
■I- SIMILARITY: CONTAINS 7 EGF-LIKE DOMAINS. 
•!• SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

MANY PROTEINS. NUMBER IN THIS PROTEIN: 22, TWO BLOCK OF 6 LRR'S 

AND TWO BLOCKS OF 5 LRR'S. 
-!- SIMILARITY: CONTAINS A C-TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK), 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib,ch/announce/ 
or send an email to license@isb-sib.ch) . 

EMBL; X53959; G8615; -. 
PIR; A36665; A36665, 
FLYBASE; FBgn0003425; sli. 
PROSITE; PS00010; ASXJYDROXYL; 3. 
PROSITE; PS00022; EGF.l; 7. 
'PROSITE; PS01185; CTCK J; 1. 
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DR PROSITE; PS01186; EGFJ; 5. 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PROSITE; PS01225; CTCK 2; 1. 

DR PFAM; PF00007; Cysjnot; 1. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PF00054; laminin_G; 1. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUCINE-REPEAT; DUPLICATION. 



FT 


SIGNAL 


1 


36 




FT 


CHAIN 


37 


1480 


SLIT PROTEIN. 


FT 


DOMAIN 


70 


104 


CONSERVED N-FLANKING REGION OF THE LRR. 


FT 


DOMAIN 


105 


230 


LEUCINE-RICH REPEATS (1ST REGION). 


FT 


DOMAIN 


231 


294 


CONSERVED C'FLANKING REGION OF THE LRR. 


FT 


DOMAIN 


295 


326 


CONSERVED N-FLANKING REGION OF THE LRR. 


FT 


DOMAIN 


327 


452 


LEUCINE-RICH REPEATS (2ND REGION). 


FT 


DOMAIN 


453 


518 


PONIFRVTT) f-FTMIfTNG RFfiTAN OF THF T.RB 


FT 


DOMAIN 


519 


550 


CONSERVED N-FLANKING REGION OF THE LRR. 




DOMAIN 


551 


653 


T.FrifTNF-RTrH RFPFATQ HRF) RFfiTflNI 


FT 


DOMAIN 


654 


714 


CONS.FRVF.n P-FTMKTNG RFftTON DP THF T,RR 


■ 


DOMAIN 


715 


746 


CONSERVED N-FLANKING REGION OF THE LRR. 




DOMAIN 


747 


848 


LEUCINE-RICH REPEATS (4TH REGION), 


w 


DOMAIN 


849 


910 


ffiN^FRVFn r-FT.ANKTNG RFflTON OF THF TBB 


FT 


REPEAT 


105 


115 


LRR 1-1. 


FT 


REPEAT 


116 


139 


LRR 1-2. 


FT 


REPEAT 


140 


163 


LRR 1-3, 


FT 


REPEAT 


164 


187 


LRR 1-4. 


FT 


REPEAT 


188 


211 


LRR 1-5. 


FT 


REPEAT 


212 


230 


LRR 1-6, 


FT 


REPEAT 


327 


337 


LRR 2-1. 


FT 


REPEAT 


338 


361 


LRR 2-2. 


FT 


REPEAT 


362 


385 




FT 


REPEAT 


386 


409 


LRR 2-4, 


FT 


REPEAT 


410 


433 


LRR 2*5, 


FT 


REPEAT 


434 


452 


LRR 2-6, 


FT 


REPEAT 


551 


562 


LRR 3-1. 


FT 


REPEAT 


563 


586 


LRR 3-2. 


FT 


REPEAT 


587 


610 


LRR 3-3. 


FT 


REPEAT 


611 


634 


LRR 3-4. 


FT' 


REPEAT 


635 


653 


LRR 3-5. 


FT 


REPEAT 


747 


757 


LRR 4-1. 


FT 


REPEAT 


758 


781 


LRR 4-2, 


FT 


REPEAT 


782 


805 


LRR 4-3. 


FT 


REPEAT 


806 


829 


LRR 4-4. 


FT 


REPEAT 


830 


848 


LRR 4-5. 


FT 


DOMAIN 


907 


944 


EGF-LIKE 1, 


FT 


DOMAIN 


946 


983 


EGF-LIKE 2, 


FT 


DOMAIN 


985 


1022 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL) 




DOMAIN 


1024 


1062 


EGF-LIKE 4, 


m 


DOMAIN 


1064 


1100 




P 


DOMAIN 


1111 


1149 


FflF-TTltF (! 


FT 


DOMAIN 


1353 


1392 




FT 


DOMAIN 


1409 


1480 


CTCK. 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM). 


FT 


CARBOHYD 


111 


111 


POTENTIAL, 


FT 


CARBOHYD 


207 


207 


POTENTIAL, 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


FT 


CARBOHYD 


435 


435 


POTENTIAL, 


FT 


CARBOHYD 


783 


783 


POTENTIAL. 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


FT 


CARBOHYD 


998 


998 


POTENTIAL. 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL. 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL, 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL. 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISULFID 


916 


932 


BY SIMILARITY. 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DISULFID 


950 


961 


BY SIMILARITY, 


FT 


DISULFID 


955 


971 


BY SIMILARITY. 


1 


/ 













973 




DV CTUTT HOTHV 

Di DiMILAKI.il. 




nTCTTT T7TH 




1001 


BY SIMILARITY. 


FT 


nTCttt FTFl 


995 


1010 


DV CTUTT aDTTV 
Dl SiMiLAKllI. 


FT 


nTcriT.CTn 

UlBUhllU 






Dl SiPliLAKill. 


FT 




1028 


1041 


DV CTUTT SDTTV 
Dl olnlLMll 1 , 


FT 


nTCnTFTfi 


1035 


1050 


DV CTUTT 


pm 


fiTCTTT DTT* 




1061 


BY SIMILARITY. 


FT 


UXoVbtLU 


1068 


1079 


DV CTUTT iOTTV 


FT 


FlTCnT pin 


1073 


1088 


DV CTUTT &DTTV 
01 DiMiLAKill. 


FT 


DISULFID 


1090 


1099 


BV CTMTT.1RTTV 




DISULFID 


1115 


1125 


DV CTMTr&RTTV 


FT 




1120 


1137 


DV CTMTT1BTTV 
Dl OiBlbnKlll . 


FT 


DISULFID 


1139 


1148 


BY SIMILARITY. 


FT 


DISULFID 


1357 


1368 


BY SIMILARITY. 


FT 


DISULFID 


1362 


1380 


BY SIMILARITY. 


FT 


DISULFID 


1382 


1391 


BY SIMILARITY. 


FT 


DISULFID 


1409 


1443 


BY SIMILARITY. 


FT 


DISULFID 


1423 


1457 


BY SIMILARITY. 


FT 


DISULFID 


1434 


1473 


BY SIMILARITY, 


FT 


DISULFID 


1438 


1475 


BY SIMILARITY. 


FT 


DISULFID 


1442 


1479 


BY SIMILARITY. 


SQ 


SEQUENCE 


1480 AA; 165752 MW; 2CD1C421 C 



Query Match 44.2%; Score 483; DB 1; Length 1480; 

Best Local Similarity 44.24; Pred. No. 1.19e-80; 

Matches 69; Conservative 40; Mismatches 44; Indels 3; Gaps 3; 

Db 451 KNPFICDCNLRWLAD-YLHKNPIETSGARCESPKRMHRRRIESLREEKFKCSWGELRMKL 509 

:l llllhlll: |:|| IlililH! Ill:::::: :| :|||| :| : : 
Qy 1 RNPXICDCNLQWLAQINLQKN-IETSGARCEQPKRLRKKKFATLPPNKFKCKGSESFVSM 59 

Db 510 SGE-CRMDSXPAMCHCEGTTVDCTGRRLKEIPRDIPLHTTELLLNDNELGRISSDGLFG 568 

I :M II: I I MINI I |; II II :|:||h I::: : :: : 
Qy 60 YADSCFIDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIH 119 

Db 569 RLPHLVKLELKRNQLTGIEPNAFEGASHIQELQLGE 604 

I :l hi I:: |: ::|| I ::|| I ; 
Qy 120 VLENLEXLDLSNNHITFINDKSFEKLSKLRELXLND 155 



RESULT 2 

ID PGS1.RAT STANDARD; PRT; 369 AA. 

AC P47853; 

DT 01-FEB-1996 (REL. 33, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 

DE BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (PG-S1). 

GN BGN, 

OS RATTUS NORVEGICUS (RAT) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-VASCULAR SMOOTH MUSCLE; 

RX MEDLINE; 91184222. 

RA DREHER K.L., ASUNDI V., MATZURA D., COWAN K.; 

RT "Vascular smooth muscle biglycan represents a highly conserved 

RT proteoglycan within the arterial wall."; 

RL EUR. J. CELL BIOL. 53:296-304(1990). 

CC -I- TISSUE SPECIFICITY; FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 

CC CONNECTIVE TISSUES, SPECIALLY IN ARTICULAR CARTILAGES, 

CC •!- PTM: THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 

CC EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE (BY SIMILARITY). 

CC -I- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 

CC FAMILY. 

CC •!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 12. 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 
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CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; 017834; G600498; -. 

DR PFAM; PF00560; LRR; 5. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUCINE- REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


PROPEP 


20 


37 




FT 


CHAIN 


38 


369 


BONE/CARTILAGE PROTEOGLYCAN I. 


FT 


DOMAIN 


72 


343 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


72 


95 


LRR 1. 


FT 


REPEAT 


96 


116 


LRR 2. 


FT 


REPEAT 


117 


140 


LRR 3. 


FT 


REPEAT 


141 


164 


LRR 4. 


FT 


REPEAT 


165 


185 


LRR 5, 




REPEAT 


186 


211 


LRR 6. 


1 


REPEAT 


212 


231 


LRR 7. 




REPEAT 


232 


255 


LRR 8. 


FT 


REPEAT 


256 


276 


LRR 9. 


FT 


REPEAT 


277 


302 


LRR 10. 


FT 


REPEAT 


303 


321 


LRR 11. 


FT 


REPEAT 


336 


343 


LRR 12. 


FT 


CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


48 


48 


GLYCOSAMINOGLYCAN (BY SIMILARITY) . 


FT 


CARBOHYD 


271 


271 


POTENTIAL. 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 


FT 


DISULFID 


64 


77 


BY SIMILARITY. 


FT 


DISULFID 


322 


355 


BY SIMILARITY, 


so 


SEQUENCE 


369 AA; 


41706 MW; 6555ECED CRC32; 



Query Match 18.0%; Score 197; DB 1; Length 369; 

Best Local Similarity 34,8*; Pred. No, 4,95e-19; 

Matches 31; Conservative 23; Mismatches 34; Indels 1; Gaps 1 

Db 60 FSAMCPFGCHCHLRWQCSDLGLKTVPKEISPDTTLLDLQNNDISELR-KDDFKGLQHLY 118 

: "II I I hi: 1 1 : 1 : 1 h :| I I |:|| : : :: |::| 
Qy 66 IDS IC PTQCDCYGTTVDCNKRGLNT I PTS IPRFATQLLLSGNNI STVDLNSNI HVLENLE 125 

Db 119 ALVLVNNKISKIHEKAFSPLRKLQKLYIS 147 

I I II I: ::!:! I l|: I :: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



IULT 3 

PGS1JOUSE STANDARD; PRT; 369 AA. 
P28653; Q61355; 

01-DEC-1992 (REL, 24, CREATED) 
01-DEC-1992 (REL, 24, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 
BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (PG-S1). 
BGN. 

MUS MUSCULUS (MOUSE). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

RODENTIA; SCIUROGNATHI ; MURIDAE; MURINAE; MUS. 

[1) 

SEQUENCE FROM N.A. 

STRAIN-NIH SWISS; TISSUE=FIBROBLAST; 
NAITOH Y. , SUZUKI S.; 

SUBMITTED (JUL-1990) TO EMBL/GENBANK/DDBJ DATA BANKS. 
[2] 

SEQUENCE FROM N.A. 

STRAIN-NIH SWISS; TISSUE 0 EMBRYO; 

MEDLINE; 94319093. 

RAU W., JUST W., VETTER 0., VOGELW.; 

"A dinucleotide repeat in the mouse biglycan gene (EST) on the X 

chromosome."; 

MAMM. GENOME 5:395-396(1994). 

•!- TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 
CONNECTIVE TISSUES, SPECIALLY IN ARTICULAR CARTILAGES, 
PTM: THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 
EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE (BY SIMILARITY). 
SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 



-I- 



CC 


FAMILY. 






CC 


■!■ SIMILARITY: THE REP 


EATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 


CC 
CC 
CC 


MANY PROTEINS, 


NUMB 


ER IN THIS PROTEIN: 12. 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


CC 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation - 


CC 


the European Bioinformatics Institute. There are no restrictions on its 


CC 


use by 


non-profit institutions as long as its content is in no way 


CC 


modified and this statement is not removed, Usage by and for commercial 


CC 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


CC 
CC 
DR 


or send an email to licenseSisb-sib.ch). 


EMBL; X53928; G53667; - 




DR 


EMBL; L20276; G348962; 




DR 


PIR; S20811; S20811. 




DR 


MGD; MGI: 88158; BGN. 




DR 


PFAM; PF00560; LRR; 5. 




KW 


GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 


KW 


REPEAT; LEUCINE-RE 


PEAT; 


SIGNAL. 


FT* 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


PROPEP 


20 


37 




FT 


CHAIN 


38 


369 


BONE/CARTILAGE PROTEOGLYCAN I. 


FT 


DOMAIN 


72 


343 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


72 


95 


LRR 1. 


FT 


REPEAT 


96 


116 


LRR 2. 


FT 


REPEAT 


117 


140 


LRR 3. 


FT 


REPEAT 


141 


164 


LRR 4, 


FT 


REPEAT 


165 


185 


LRR 5. 


FT 


REPEAT 


. 186 


211 


LRR 6. 


FT 


REPEAT 


212 


231 


LRR 7. 


FT 


REPEAT 


232 


255 


LRR 8. 


FT 


REPEAT 


256 


276 


LRR 9. 


FT 


REPEAT 


277 


302 


LRR 10. 


FT 


REPEAT 


303 


321 


LRR 11. 


FT 


REPEAT 


336 


343 


LRR 12. 


FT 


CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


48 


48 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


271 


271 


POTENTIAL. 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 


FT 


DISULFID 


64 


77 


BY SIMILARITY, 


FT 


DISULFID 


322 


355 


BY SIMILARITY. 


FT 


CONFLICT 


68 


68 


C -> W (IN REF. 2). 


SQ 


SEQUENCE 


369 AA; 


41639 MW; ED21DD6B CRC32; 



Query Match 18.01; Score 197; DB 1; Length 369; 

Best Local Similarity 34.8%; Pred. No. 4.95e-19; 

Matches 31; Conservative 23; Mismatches 34; Indels 1; Gaps 1; 

Db 60 FSAMCPFGCHCHLRWQCSDLGLKTVPKEISPDTTLLDLQNNDISELR-KDDFKGLQHLY 118 

: ::ll I I |:|: lh|:| |: :| I I |:|| : : :: |::| 
2y 66 I DS I CPTQCDC YG TTVDCNKRGLNT IPTSI PRFATQLLLSGNNI ST VDLNSNI HVLENLE 125 

Db 119 ALVLVNNKISKIHEKAFSPLRKLQKLYIS 147 

I I IN: h:hl I l|: I :: 
3y 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



4 



360 AA, 



PGS2_BOVIN STANDARD; 
P21793; 

01-MAY-1991 (REL. 
01-MAY-1991 (REL. 18, LAST SEQUENCE UPDATE) 
01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 
BONE PROTEOGLYCAN II PRECURSOR (PG-S2) (DECORIN) , 
DCN. 

BOS TAURUS (BOVINE). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; BOVINAE; BC 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 88133946. 

DAY A, A., MCQUILLAN C.I., TERMINE J.D., YOUNG M.R.; 
"Molecular cloning and sequence analysis of the cDNA for small 
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RT proteoglycan II of bovine bone."; 
RL BIOCHEM, J. 248:801-805(1987). 
RN [2] 

RP SEQUENCE OF 31-54. 
RX MEDLINE; 89123388, 

RA CHOI H.U., JOHNSON T.L., PAL S., TANG L.H,, ROSENBERG L., NEAME P.J.; 
RT "Characterization of the dermatan sulfate proteoglycans, DS-PGI and 
RT DS-PGII, from bovine articular cartilage and skin isolated by octyl- 
*"" sepharose chromatography."; 

J. BIOL. CHEM. 264:2876-2884(1989). 

-!- FUNCTION: BINDS TO TYPE I AND TYPE II COLLAGEN AND AFFECTS THE 
RATE OF FIBRILS FORMATION. ALSO BINDS TO FIBRONECTIN AND TGF- 
BETA. 

-!- PTM: THE GLYCOSAMINOGLYCAN CHAIN ATTACHED TO DECORIN CAN BE EITHER 
CHONDROITIN SULFATE OR DERMATAN SULFATE DEPENDING UPON THE 
TISSUE OF ORIGIN, 
-!- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
FAMILY, 

-I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS. NUMBER IN THIS PROTEIN: 10, 



RT 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

•This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 
the European Bioinforraatics Institute. There are no restrictions on its 
CC use by non-profit institutions as long as its content is in no way 
CC modified and this statement is not removed, usage by and for commercial 
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
CC or send an email to license@isb-sib.ch). 

CC 



EMBL; Y00712; G619; -. 
PIR; S06280; S06280. 
PIR; B31430; B31430. 
PFAM; PF00560; LRR; 5, 

GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 
~"~ LEUCINE-REPEAT; SIGNAL. 



DR 
DR 
DR 
DR 
KW 
KW 

FT SIGNAL 

FT PROPEP 

FT CHAIN 

FT DOMAIN 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

•CARBOHYD 
CARBOHYD 
CARBOHYD 

FT CARBOHYD 

FT DISULFID 

FT DISULFID 

SQ SEQUENCE 



1 


16 


POTENTIAL. 


RT 


17 


30 




RT 


31 


360 


BONE PROTEOGLYCAN II, 


RL 


78 


309 


LEUCINE-RICH REPEATS . 


RN 


78 


99 


LRR 1. 


RP 


100 


123 


LRR 2. 


RX 


124 


146 


LRR 3. 


RA 


147 


168 


LRR 4. 


RT 


169 


194 


LRR 5. 


RT 


195 


218 


LRR 6. 


RT 


219 


239 


LRR 7. 


RL 


240 


263 


LRR 8, 


CC 


264 


286 


LRR 9. 


CC 


287 


309 


LRR 10. 


CC 


34 


34 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


CC 


212 


212 


POTENTIAL. 


CC 


263 


263 


POTENTIAL, 


CC 


304 


304 


POTENTIAL. 


CC 


55 


68 


BY SIMILARITY. 


CC 


314 


347 


BY SIMILARITY, 


CC 



360 AA; 39837 MW; 92C2A1A5 CRC32; 



Query Match 17.9%; Score 195; DB 1; Length 360; 

Best Local Similarity 34.4%; Pred, No, 1.22e-18; 

Matches 31; Conservative 21; Mismatches 35; Indels 3; Gaps 3; 

Db 51 MGPVCPFRCQCHLRVYQCSDLGLEKVPKDLPP-DTALLDLQNNKITEIR-DGDFKNLKNL 108 

: ::ll :|:l |:|: lh =1 :| I II I |:|: : :::: I II 
3y 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLL-LSGNNISTVDLNSNIHVLENL 124 

Db 109 HTLILINNKISKISPGAFAPLVKLERLYLS 138 

I I II I: I: :| III I I: 
Qy 125 EXLDLSNNHITFI NDKSFEKLSKLRELXLN 154 



ID PGS2 HUMAN 
AC P07585; 



01-APR-1988 (REL. 07, CREATED) 

01-APR-1988 (REL. 07, LAST SEQUENCE UPDATE) 

01-OCT-1996 (REL, 34, LAST ANNOTATION UPDATE) 

BONE PROTEOGLYCAN II PRECURSOR (PG-S2) (DECORIN) (PG40). 

DCS, 

HOMO SAPIENS (HUMAN) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

PRIMATES; CATARRHINI; HOMINIDAE; HOMO, 

[1] 

SEQUENCE FROM N.A. 
MEDLINE; 87017013. 
KRUSIUS T., RUOSLAHTI E.; 

"Primary structure of an extracellular matrix proteoglycan core 

protein deduced from cloned cDNA."; 

PROC. NATL. ACAD. SCI. U.S.A. 83:7683-7687(1986), 

[2] 

SEQUENCE FROM N.A. 
TISSUE-LtJNG; 
MEDLINE; 93162643, 

VETTER U,, VOGEL W., JUST W,, YOUNG M.F., FISHER L.W.; 

"Human decorin gene: intron-exon junctions and chromosomal 

localization."; 

GENOMICS 15:161-168(1993). 

[3] 

SEQUENCE OF 1-70 FROM N.A. 
MEDLINE; 93162642. 

DANIELSON K.G., FAZZIO A., COHEN I.R., CANNIZZARO L., IOZZO R.V.; 
"The human decorin gene: intron-exon organization, discovery of two 
alternatively spliced exons in the 5' untranslated region, and 
mapping of the gene to chromosome 12q23,\- 
GENOMICS 15:146-160(1993), 
[4] 

SEQUENCE OF 31-50, 
MEDLINE; 90073579. 
ROUGHLEY P.J., WHITE R.J.; 

"Dermatan sulphate proteoglycans of human articular cartilage. The 
properties of dermatan sulphate proteoglycans I and II."; 
BIOCHEM. J. 262:823-827(1989). 
[5] 

SEQUENCE OF 31-49. 
MEDLINE; 87250639. 

FISHER L.W., HAWKINS G.R., TUROSS N., TERMINE J.D.; 
"Purification and partial characterization of small proteoglycans I 
and II, bone sialoproteins I and II, and osteonectin from the mineral 
compartment of developing human bone."; 
J. BIOL. CHEM, 262:9702-9708(1987), 



FUNCTION: BINDS TO TYPE I AND TYPE II COLLAGEN AND AFFECTS THE 
RATE OF FIBRILS FORMATION. ALSO BINDS TO FIBRONECTIN AND TGF- 



-!- PTM: THE GLYCOSAMINOGLYCAN CHAIN ATTACHED TO DECORIN CAN BE EITHER 
CHONDROITIN SULFATE OR DERMATAN SULFATE DEPENDING UPON THE 
TISSUE OF ORIGIN. 

■!- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
FAMILY. 

-!- SIMILARITY; THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS. NUMBER IN THIS PROTEIN: 10, 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch). 



; M14219; 
; L01131; 
; L01125; 
; L01126; 
; L01127; 
; L01129; 
; L0U30; 
; M98262; 



6181170; -. 

6181519; ALT.SEQ, 

6181519; JOINED. 

6181519; JOINED. 

6181519; JOINED. 

6181519; JOINED. 

6181519; JOINED. 

6609452; -. 
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DR PIR; A26476; NBHDC8. 

DR PIR; S05640; S05640. 

DR PIR; B28457; B28457. 

DR PIR; A45016; A45016, 

DR MIM; 125255; -. 

DR PFAM; PF0056Q; LRR; 5. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUCINE" REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


16 


POTENTIAL. 


FT 


PROPEP 


17 


30 




FT 


CHAIN 


31 


359 


BONE PROTEOGLYCAN II. 


FT 


DOMAIN 


77 


308 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


77 


98 


•LRR 1. 


FT 


REPEAT 


99 


122 


LRR 2, 


FT 


REPEAT 


123 


145 


LRR 3. 


FT 


REPEAT 


146 


167 


LRR 4. 




REPEAT 


168 


193 


LRR 5. 


• 


REPEAT 


194 


217 


LRR 6. 




REPEAT 


218 


238 


LRR 7. 


FT 


REPEAT 


239 


262 


LRR 8. 


FT 


REPEAT 


263 


285 


LRR 9, 


FT 


REPEAT 


286 


308 


LRR 10. 


FT 


CARBOHYD 


34 


34 


GLYCOSAMINOGLYCAN. 


FT 


CARBOHYD 


211 


211 


POTENTIAL. 


FT 


CARBOHYD 


262 


262 


POTENTIAL. 


FT 


CARBOHYD 


303 


303 


POTENTIAL. 


FT 


DISULFID 


54 


67 


BY SIMILARITY. 


FT 


DISULFID 


313 


346 


BY SIMILARITY. 


FT 


CONFLICT 


37 


37 


G -> A (IN REF. 5). 


FT 


CONFLICT 


45 


45 


D •> P (IN REF. 5). 


SQ 


SEQUENCE 


359 AA; 


39746 MW 


F2051F3A CRC32; 



Query Match 17.7%; Score 193; DB 1; Length 359; 

Best Local Similarity 32.6%; Pred, No. 2.98e-18; 

Matches 29; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 50 LGPVCPFRCQCHLRWQCSDLGLDKVPKDLPPDTTLLDLQNNKITEIK-DGDFKNLKNLH 108 

: :|:| |:|: II: :| :| :| I I |:|: : :::: I II 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 109 ALILVNNKISKVSPGAFTPLVKLERLYLS 137 

I I III: :: :| Ml I |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 

•ULT 6 
PGS2_CANFA STANDARD; PRT; 360 AA. 

AC Q29393; 

DT 15-JUL-1998 (REL, 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE BONE PROTEOGLYCAN II PRECURSOR (PG-S2) (DECORIN) . 

GN DCN OR DCN1C. 

OS CANIS FAMILIARIS (DOG) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC CARNIVORA; FISSIPEDIA; CANIDAE; CANIS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RA GLANT T.T.; 

RL SUBMITTED (DEC-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [2] 

RP SEQUENCE OF 244-259 FROM N.A. 

RA VENTA P.J., BROUILLETTE J.A., YUZBASIYAN-GURKAN V., BREWER G.J.; 

RL SUBMITTED (APR-1996) TO EMBL/GENBANK/DDBJ DATA BANKS . 

CC -!- FUNCTION: BINDS TO TYPE I AND TYPE II COLLAGEN AND AFFECTS THE 
CC RATE OF FIBRILS FORMATION, ALSO BINDS TO FIBRONECTIN AND TGF- 
CC BETA (BY SIMILARITY). 

CC ■!■ PTM: THE GLYCOSAMINOGLYCAN CHAIN ATTACHED TO DECORIN CAN BE EITHER 
CC CHONDROITIN SULFATE OR DERMATAN SULFATE DEPENDING UPON THE 
CC TISSUE OF ORIGIN (BY SIMILARITY). 

CC •!- SIMILARITY; BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
CC FAMILY, 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 



CC MANY PROTEINS, NUMBER IN THIS PROTEIN: 10. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to Hcense@isb-sib.ch) . 

CC 

DR EMBL; 083141; G1916848; -. 

DR EMBL; L77684; G1280195; -. 

DR PFAM; PF00560; LRR; 5. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUCINE-REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


16 


POTENTIAL. 


FT 


PROPEP 


17 


30 


BY SIMILARITY. 


FT 


CHAIN 


31 


360 


BONE PROTEOGLYCAN II. 


FT 


DOMAIN 


78 


309 


LEUCINE-RICH REPEATS . 


FT 


REPEAT 


78 


99 


LRR 1. 


FT 


REPEAT 


100 


123 


LRR 2. 


FT 


REPEAT 


124 


146 


LRR 3. 


FT 


REPEAT 


147 


168 


LRR 4. 


FT 


REPEAT 


169 


194 


LRR 5. 


FT 


REPEAT 


195 


218 


LRR 6. 


FT 


REPEAT 


219 


239 


LRR 7. 


FT 


REPEAT 


240 


263 


LRR 8. 


FT 


REPEAT 


264 


286 


LRR 9. 


FT 


REPEAT 


287 


309 


LRR 10, 


FT 


CARBOHYD 


34 


34 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


212 


212 


POTENTIAL. 


FT 


CARBOHYD 


263 


263 


POTENTIAL. 


FT 


CARBOHYD 


304 


304 


POTENTIAL. 


FT 


DISULFID 


55 


68 


BY SIMILARITY. 


FT 


DISULFID 


314 


347 


BY SIMILARITY. 


SQ 


SEQUENCE 


360 AA; 


39980 MW 


601D533F CRC32; 



Query Match 17.7*; Score 193; DB 1; Length 360; 

Best Local Similarity 33.3%; Pred. No. 2,98e-18; 

Matches 30; Conservative 23; Mismatches 36; Indels 1; Gaps 1; 

Db 50 LLGPVCPFRCQCHLRWQCSDLGLDKVPKDLPPDTTLLDLQNNKITEIK-DGDFKNLKNL 108 

:: ::H =hl |:|: II: :| :| :| I I |:|: : :::: I || 
Qy 65 FIDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENL 124 

Db 109 HTLILVNNKISKISPGAFTPLLKLERLYLS 138 

I I II I: I: :! I II I |: 
Qy 125 EXLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 7 

ID PGS1JUMAN STANDARD; PRT; 368 AA. 

AC P21810; P13247; 

DT 01-JAN-1990 (REL, 13, CREATED) 

DT 01-APR-1993 (REL. 25, LAST SEQUENCE UPDATE) 

DT 01-OCM996 (REL. 34, LAST ANNOTATION UPDATE) 

DE BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (PG-Sl). 

GN BGN, 

OS HOMO SAPIENS (HUMAN) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BONE; 

RX MEDLINE; 89174714. 

RA FISHER L.W., TERMINE J.D., YOUNG M.F.; 

RT "Deduced protein sequence of bone small proteoglycan I (biglycan) 

RT shows homology with proteoglycan II (decorin) and several 

RT nonconnective tissue proteins in a variety of species,"; 

RL J. BIOL. CHEM. 264:4571-4576(1989). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91317791. 
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RA FISHER L.W., HEEGAARD A,M,, VETTER U., VOGEL H„ JUST W., 

RA TERMINE J.D., YOUNG M.F.; 

RT "Human biglycan gene. Putative promoter, intron-exon junctions, and 

RT chromosomal localization/; 

RL J, BIOL. CHEM. 266:14371-14377(1991). 

RN (3] 

RP SEQUENCE OF 38-57. 

RX MEDLINE; 90073579. 

RA ROUGHLEY P.J., WHITE R.J.; 

RT "Dermatan sulphate proteoglycans of human articular cartilage. The 

RT properties of dermatan sulphate proteoglycans I and II. "; 

RL BIOCHEM. J. 262:823-827(1989). 

RN [4] 

RP SEQUENCE OF 38-66. 

RX MEDLINE; 87250639. 

RA FISHER L.W., HAWKINS G.R., TUROSS N,, TERMINE J.D.; 

RT "Purification and partial characterization of small proteoglycans I 

RT and II, bone sialoproteins I and II, and osteonectin from the mineral 

RT compartment of developing human bone."; 

RL J. BIOL. CHEM. 262:9702-9708(1987). 

CC -!• TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 
A CONNECTIVE TISSUES, SPECIALLY IN ARTICULAR CARTILAGES. 

■ •!• PTM: THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 
W EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE. 

CC •!• SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
CC FAMILY. 

CC •!• SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 12. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch), 

CC 

DR EMBL; J04599; G306884; -. 

DR EMBL; M65153; G179433; ALT SEQ. 

DR EMBL; M65152; G179433; JOINED. 

DR PIR; A28457; A28457. 

DR PIR; A32458; A32458. 

DR PIR; A40757; A40757. 

DR PIR; S05639; S05639. 

DR MIM; 301870; -. 

DR PFAM; PF00560; LRR; 5. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUCINE-REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 




PROPEP 


20 


37 




1 


CHAIN 


38 


368 


BONE/CARTILAGE PROTEOGLYCAN I, 




DOMAIN 


71 


342 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


71 


94 


LRR 1. 


FT 


REPEAT 


95 


115 


LRR 2. 


FT 


REPEAT 


116 


139 


LRR 3, 


FT 


REPEAT 


140 


163 


LRR 4, 


FT 


REPEAT 


164 


184 


LRR 5. 


FT 


REPEAT 


185 


210 


LRR 6. 


FT 


REPEAT 


211 


230 


LRR 7. 


FT 


REPEAT 


231 


254 


LRR 8. 


FT 


REPEAT 


255 


275 


LRR 9. 


FT 


REPEAT 


276 


301 


LRR 10, 


FT 


REPEAT 


302 


320 


LRR 11. 


FT 


REPEAT 


335 


342 


LRR 12. 


FT 


CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN. 


FT 


CARBOHYD 


47 


47 


GLYCOSAMINOGLYCAN, 


FT 


CARBOHYD 


270 


270 


POTENTIAL. 


FT 


CARBOHYD 


311 


311 


POTENTIAL, 


FT 


DISULFID 


63 


76 


BY SIMILARITY. 


FT 


DISULFID 


321 


354 


BY SIMILARITY. 


FT 


CONFLICT 


139 


140 


KL -> NV (IN REF. 1), 


FT 


CONFLICT 


163 . 


164 


EL -> DV (IN REF. 1). 


SQ 


SEQUENCE 


368 AA; 


41654 MW 


6820F8DF CRC32; 



Query Match 17.7%; Score 193; DB 1; Length 368; 

Best Local Similarity 34.51; Pred. No. 2.98e-18; 

Matches 30; Conservative 23; Mismatches 33; Indels 1; Gaps 1; 

Db 61 AMCPFGCHCHLRWQCSDLGLKSVPKEISPDTTLLDLQNNDISELR-KDDFKGLQHLYAL 119 

■ ::H I I 1:1: l|:::| h :| I I hi! : : :: |::| I 
Qy 68 SICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXL 127 

Db 120 VLVNNKISKIHEKAFSPLRKLQKLYIS 146 

I II I: |::|:| I II: I :: 
Qy 128 DLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 8 

ID PGS1JOVIN STANDARD; PRT; 369 AA. 

AC P21809; P79259; 

DT 01-MAY-1991 (REL. 18, CREATED) 

DT 15-JUL-1998 (REL, 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (LEUCINE-RICH PG I) 

DE (PG-S1). 

GN BGN. 

OS BOS TAURUS (BOVINE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; BOVINAE; BOS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=AORTA; 

RX MEDLINE; 96113563. 

RA XU J.H., RADHAKRISHNAMURTHY B., SRINIVASAN S.R., BERENSON G.S.; 

RT "Primary structure of bovine aorta biglycan core protein deduced from 

RT Cloned CDNA.»; 

RL BIOCHEM. MOL. BIOL. INT. 37:263-272(1995). 

RN [2] 

RP SEQUENCE OF 38-369. 

RC TISSUE-CARTILAGE; 

RX MEDLINE; 89255324. 

RA NEAME P.J., CHOI H.U., ROSENBERG L.C.; 

RT "The primary structure of the core protein of the small, leucine-rich 

RT proteoglycan (PG I) from bovine articular cartilage."; 

RL J. BIOL. CHEM. 264:8653-8661(1989). 

RN [3] 

RP SEQUENCE OF 38-63. 

RC TISSUE-CARTILAGE; 

RX MEDLINE; 89123388. 

RA CHOI H.U., JOHNSON T.L., PAL S., TANG L.H., ROSENBERG L.C., 

RA NEAME P.J. ; 

RT "Characterization of the dermatan sulfate proteoglycans, DS-PGI and 

RT DS-PGII, from bovine articular cartilage and skin isolated by octyl- 

RT sepharose chromatography."; 

RL J. BIOL. CHEM. 264:2876-2884(1989). 

CC -!- TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 

CC CONNECTIVE TISSUES, SPECIALLY IN ARTICULAR CARTILAGES. 

CC ■!■ PTM: THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 

CC EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE. 

CC -1- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 

CC FAMILY. 

CC ■!■ SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 10. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseMsb-sib.ch). 

cc 

DR EMBL; S82652; G1835865; -. 

DR PIR; A33701; A33701. 

DR. PFAM; PF00560; LRR; 5. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; ' 
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KW 


SIGNAL; REPEAT; LEUCINE- REPEAT. 


FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


PROPEP 


20 


37 




FT 


CHAIN 


38 


369 


BONE/CARTILAGE PROTEOGLYCAN I. 


FT 


DOMAIN 


93 


316 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


93 


106 


LRR 1. 


FT 


REPEAT 


117 


130 


LRR 2. 


FT 


REPEAT 


141 


154 


LRR 3. 


FT 


REPEAT 


162 


175 


LRR 4. 


FT 


REPEAT 


186 


199 


LRR 5. 


FT 


REPEAT 


211 


224 


LRR 6. 


FT 


REPEAT 


232 


245 


LRR 7. 


FT 


REPEAT 


256 


269 


LRR 8. 


FT 


REPEAT 


280 


288 


LRR 9. 


FT 


REPEAT 


303 


316 


LRR 10, 


FT 


CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN, 


1 


CARBOHYD 


48 


48 


GLYCOSAMINOGLYCAN. 


1 


CARBOHYD 


271 


271 






CARBOHYD 


312 


312 




FT 


DISULFID 


64 


77 




FT 


DISOLFID 


322 


355 




FT 


CONFLICT 


152 


152 


C -> V (IN REF. 2). 


FT 


CONFLICT 


188 


188 


C -> E (IN REF. 2). 


FT 


CONFLICT 


354 


354 


A •> R (IN REF. 2). 


FT 


CONFLICT 


368 


369 


KK -> Y (IN REF. 2). 


SQ 


SEQUENCE 


369 AA; 


41509 MW 


F1CC673B CRC32; 



Query Match 17.7%; Score 193; DB 1; Length 369; 

Best Local Similarity 34.5%; Pred. No, 2.98e-18; 

Matches 30; Conservative 23; Mismatches 33; Indels 1; Gaps 1; 

Db 62 AMCPFGCHCHLRVVQCSDLGLKAVPKEISPDTTLLDLQNNDISELR-KDDFKGLQHLYAL 120 

"II I I 1:1: ll:::| |: :| I I |:|| : : :: |::| | 
Qy 68 SICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXL 127 

Db 121 VLVNNKISKIHEKAFSPLRKLQKLYIS 147 

I II I: l::|:| I II: I :: 
Qy 128 DLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 9 

ID PGS2.CHICK STANDARD; PRT; 357 AA. 

AC P28675; 

DT 01-DEC-1992 (REL. 24, CREATED) 

•01-DEC-1992 (REL. 24, LAST SEQUENCE UPDATE) 
01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 
BONE PROTEOGLYCAN II PRECURSOR (PG-S2) (DECORIN). 

OS GALLUS GALLUS (CHICKEN) . 

OC EURARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLI FORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N.A., AND PARTIAL SEQUENCE. 

RC STRAIN=WHITE LEGHORN; TISSUE-CORNEA; 

RX MEDLINE; 92296755. 

RA LI W., VERGNES J. P., CORNUET P.K,, HASSEL J.R.; 

RT "cDNA clone to chick corneal chondroitin/dermatan sulfate 

RT proteoglycan reveals identity to decor in. "; 

RL ARCH. BIOCHEM. BIOPHYS. 296:190-197(1992), 

CC -!■ FUNCTION: BINDS TO TYPE I AND TYPE II COLLAGEN AND AFFECTS THE 
CC RATE OF FIBRILS FORMATION, ALSO BINDS TO FIBRONECTIN AND TGF- 
CC BETA, 

CC -!- PTM: THE GLYCOSAMINOGLYCAN CHAIN ATTACHED TO DECORIN CAN BE EITHER 
CC CHONDROITIN SULFATE OR DERMATAN SULFATE DEPENDING UPON THE 
CC TISSUE OF ORIGIN (BY SIMILARITY) , 

CC •!- SIMILARITY; BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
CC FAMILY, 

CC -!• SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 10. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through 'a collaboration 

CC between the Swiss Institute of Bioinformatics and the embl outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 



CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; X63797; G62888; -. 

DR PIR; S22197; S22197. 

DR PIR; S24317; S24317. 

DR PFAM; PF00560; LRR; 5, 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUCINE-REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


16 




FT 


PROPEP 


17 


30 




FT 


CHAIN 


31 


357 


BONE PROTEOGLYCAN II. 


FT 


DOMAIN 


75 


306 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


75 


96 


LRR 1. 


FT 


REPEAT 


97 


120 


LRR 2, 


FT 


REPEAT 


121 


143 


LRR 3. 


FT 


REPEAT 


144 


165 


LRR 4. 


FT 


REPEAT 


166 


191 


LRR 5. 


FT 


REPEAT 


192 


215 


LRR 6, 


FT 


REPEAT 


216 


236 


LRR 7. 


FT 


REPEAT 


237 


260 


LRR 8. 


FT 


REPEAT 


261 


283 


LRR 9. 


FT 


REPEAT 


284 


306 


LRR 10. 


FT' 


CARBOHYD 


34 


34 


GLYCOSAMINOGLYCAN (BY SIMILARITY) 


FT 


CARBOHYD 


209 


209 


POTENTIAL. 


FT 


CARBOHYD 


260 


260 


POTENTIAL. 


FT 


DISULFID 


52 


65 


BY SIMILARITY. 


FT 


DISULFID 


311 


344 


BY SIMILARITY. 


SQ 


SEQUENCE 


357 AA; 


39687 MW 


48F51E32 CRC32; 



Query Match 17.6%; Score 192; DB 1; Length 357; 

Best Local Similarity 33.7%; Pred, No. 4,66e-18; 

Matches 30; Conservative 22; Mismatches 36; Indels 1; Gaps 1; 

Db 48 FGPVCPFRCQCHLRWQCSDLGLERVPKDLPPDTTLLDLQNNKITEIK-EGDFKNLKNLH 106 

: :|:| |:|: II: :| :| :| I I |:|: : :;:: | || 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 107 ALILVNNKISKISPAAFAPLKKLERLYLS 135 

I I III: |: :| I II I |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT .10 

ID PGS1.CANFA STANDARD; PRT; 369 AA. 

AC 002678; 

DT 15-JUL-1998 (REL. 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE BONE/CARTILAGE PROTEOGLYCAN I PRECURSOR (BIGLYCAN) (PG-Sl) . 

GN BGN. 

OS CANIS FAMILIARIS (DOG). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC CARNIVORA; FISSIPEDIA; CANIDAE; CANIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA GLANT T.T.; 

RL SUBMITTED (DEC-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC -!- TISSUE SPECIFICITY: FOUND IN THE EXTRACELLULAR MATRICES OF SEVERAL 

CC CONNECTIVE TISSUES, SPECIALLY IN ARTICULAR CARTILAGES. 

CC •!• PTM: THE TWO GLYCOSAMINOGLYCAN CHAINS ATTACHED TO BIGLYCAN CAN BE 

CC EITHER CHONDROITIN SULFATE OR DERMATAN SULFATE (BY SIMILARITY). 

CC -!- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 

CC FAMILY. 

CC -I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 12. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 
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CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch) , 

cc - 

DR EMBL; 083 140; G1916846; -. 

DR PFAM; PF00560; LRR; 5. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEDC INE - REPEAT ; SIGNAL, 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


PROPEP 


20 


37 


BY SIMILARITY. 


FT 


CHAIN 


38 


369 




FT 


DOMAIN 


72 


343 


LEUCINE"RICH REPEATS. 


FT 


REPEAT 


72 


95 


LRR 1, 


FT 


REPEAT 


96 


116 


LRR 2. 


FT 


REPEAT 


117 


140 


LRR 3, 


FT 


REPEAT 


141 


164 


LRR 4. 


FT 


REPEAT 


165 


185 


LRR 5. 


FT 


REPEAT 


186 


211 


LRR 6. 


FT 


REPEAT 


212 


231 


LRR 7. 


FT 


REPEAT 


232 


255 


LRR 8. 


FT 


REPEAT 


256 


276 


LRR 9. 


FT 


REPEAT 


277 


302 


LRR 10. 




REPEAT 


303 


321 


LRR 11. 


1 


REPEAT 


336 


343 


LRR 12. 




CARBOHYD 


42 


42 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


48 


48 


GLYCOSAMINOGLYCAN (BY SIMILARITY), 


FT 


CARBOHYD 


271 


271 


POTENTIAL. 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 


FT 


DISULFID 


64 


77 


BY SIMILARITY. 


FT 


DISULFID 


322 


355 


BY SIMILARITY. 


so 


SEQUENCE 


369 AA; 


41566 MW 


F794CEEA CRC32; 



Query Match 17.3%; Score 189; DB 1; Length 369; 

Best Local Similarity 35.2%; Pred, No. 1.78e-17; 

Matches 31; Conservative 21; Mismatches 33; Indels 3; Gaps 3; 

Db 62 AMCPFGCHCHLRWQCSDLGLKAVPKEISP-DTMLLDLQNNDISELRAD-DFKGLHHLYA 119 

I I hi: ll:::| |: I II I |:|| : : :: I :| 
Qy 68 SICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLL-LSGNNISTVDLNSNIHVLENLEX 126 

Db 120 LVLVNNKISKIHEKAFSPLRKLQKLYIS 147 

I I II I: l::|:| I II: I :: 
Qy 127 LDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 11 

ID PGS2J10USE STANDARD; PRT; 354 AA. 

AC P28654; 

DT 01-DEC-1992 (REL. 24, CREATED) 

DT 01-DEC-1992 (REL. 24, LAST SEQUENCE UPDATE) 

• 01-OCM996 (REL. 34, LAST ANNOTATION UPDATE) 
BONE PROTEOGLYCAN II PRECURSOR (PG-S2) (DECORIN) (PG40).' 
DCN, 

OS MUS MUSCULUS (MOUSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI ; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-NIH SWISS; TISSUE-FIBROBLAST; 

RA NAITOH Y, , SUZUKI S.; 

RL SUBMITTED (JUL-1990) TO EMBL/GENBANK/DDBJ DATA BANKS, 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 95050610. 

RA SCHOLZEN T., SOLURSH M., SUZUKI S., REITER R. , MORGAN J.L., 

RA BUCHBERG A.M., SIRACUSA L.D., IOZZO R.V.; 

RT "The murine decorin. Complete cDNA cloning, genomic organization, 

RT chromosomal assignment, and expression during organogenesis and 

RT tissue differentiation."; 

RL J, BIOL, CHEM. 269:28270-28281(1994), 

CC •!• FUNCTION: BINDS TO TYPE I AND TYPE II COLLAGEN AND AFFECTS THE 
CC RATE OF FIBRILS FORMATION. ALSO BINDS TO FIBRONECTIN AND TGF* 
CC BETA. 

CC -!- PTM: THE GLYCOSAMINOGLYCAN CHAIN ATTACHED TO DECORIN CAN BE EITHER 



CC CHONDROITIN SULFATE OR DERMATAN SULFATE DEPENDING UPON THE 
CC TISSUE OF ORIGIN (BY SIMILARITY). 

CC •!• SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
CC FAMILY. 

CC -I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 10. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@lsb-sib.ch). 

cc 

DR EMBL; X53929; G53669; -. 

DR PIR; S20812; S20812. 

DR MGD; MGI:94872; DCN. 

DR PFAM; PF00560; LRR; 5. 

DR HSSP; P23945; 1XUN, 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUC I NE - REPEAT ; SIGNAL, 



FT 


SIGNAL 


1 


16 


POTENTIAL. 


FT 


PROPEP 


17 


30 




FT 


CHAIN 


31 


354 


BONE PROTEOGLYCAN II, 


FT 


DOMAIN 


72 


303 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


72 


93 


LRR 1, 


FT 


REPEAT 


94 


117 


LRR 2, 


FT 


REPEAT 


118 


140 


LRR 3, 


FT 


REPEAT 


141 


162 


LRR 4. 


FT 


REPEAT 


163 


188 


LRR 5. 


FT 


REPEAT 


189 


212 


LRR 6, 


FT 


REPEAT 


213 


233 


LRR 7. 


FT 


REPEAT 


234 


257 


LRR 8. 


FT 


REPEAT 


258 


280 


LRR 9. 


FT 


REPEAT 


281 


303 


LRR 10. 


FT' 


DISULFID 


49 


62 


BY SIMILARITY. 


FT 


DISULFID 


308 


341 


BY SIMILARITY, 


FT 


CARBOHYD 


34 


34 


GLYCOSAMINOGLYCAN (BY SIMILARITY 


FT 


CARBOHYD 


206 


206 


POTENTIAL. 


FT 


CARBOHYD 


241 


241 


POTENTIAL. 


FT 


CARBOHYD 


257 


257 


POTENTIAL. 


FT 


CARBOHYD 


298 


298 


POTENTIAL. 


SQ 


SEQUENCE 


354 AA; 


39809 MW 


F675E543 CRC32; 



Query Match 16.8%; Score 184; DB 1; Length 354; 

Best Local Similarity 34,4%; Pred. No. 1.63e-16; 

Matches 33; Conservative 23; Mismatches 37; indels 3; Gaps 3; 

Db 39 YDPDNPLI-SMCPYRCQCHLRWQCSDLGLDKVPWDFPPDTTLLDIjQNNKITEIK-EGAF 96 

I :h :| 1:11 :|:| |:|: l|: :| :| :| I I |:|: : :: : 
Qy 60 Y-ADSCFIDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNI 118 

Db 97 RNLKDLHTLILVNNKISKISPEAFKPLVKLERLYLS 132 

I :| I I II h h :| I II I |: 
Qy 119 HVLENLEXLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 12 



ID 


PGS2JAT STANDARD; PRT; 354 AA. 


AC 


Q01129; 


DT 


01-APR-1993 (REL. 25, CREATED) 


DT 


01-APR-1993 (REL, 25, LAST SEQUENCE UPDATE) 


DT 


01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 


DE 


BONE PROTEOGLYCAN II PRECURSOR (PG-S2) (DECORIN) (PG40) (DERMATAN 


DE 


SULFATE PROTEOGLYCAN- II) (DSPG). 


GN 


DCN. 


OS 


RATTUS NORVEGICUS (RAT) . 


OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 


OC 


RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 


RN 


[1] 


RP 


SEQUENCE FROM N.A. 


RC 


STRAIN-SPRAGUE-DAWLEY; TISSUE-UTERUS; 
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RX MEDLINE; 93003331. 

RA ABRAMSON S.R., WOESSNER J.F.; 

RT "cdna sequence for rat dermatan sulfate proteoglycan-ll ■ (decorin) . " ; 

RL BIOCHIM. BIOPHYS. ACTA 1132:225-227(1992). 

RN [2] 

RP SEQUENCE OF 11-354 FROM N.A. 

RX MEDLINE; 93154359. 

RA ASUNDI V.K., DREHER K.L.; 

RT "Molecular characterization of vascular smooth muscle decorin: 

RT deduced core protein structure and regulation of gene expression,"; 

RL EUR. J. CELL BIOL. 59:314-321(1992). 

RN [3] 

RP SEQUENCE OF 31-48 AND 171-191. 

RC STRAIN a SPRAGUE-DAWLEY ; TISSUE-UTERUS; 

RX MEDLINE; 89350825. 

RA KOKENYESI R., WOESSNER J.F.; 

•"Purification and characterization of a small dermatan sulphate 
proteoglycan implicated in the dilatation of the rat uterine 
cervix."; 

RL BIOCHEM. J. 260:413-419(1989). 

CC -!- FUNCTION: BINDS TO TYPE I AND TYPE II COLLAGEN AND AFFECTS THE 
CC RATE OF FIBRILS FORMATION. ALSO BINDS TO FIBRONECTIN AND TGF- 
CC BETA. 

CC •!- FUNCTION: IMPLICATED IN THE DILATATION OF THE RAT CERVIX, 

CC -!- DEVELOPMENTAL STAGE: THE AMOUNT OF DSPG PER CERVIX INCREASES 
CC 4 -FOLD DURING PREGNANCY, THEN FALLS PRECIPITOUSLY WITHIN 1 DAY 
CC POST PATRUM. 

CC -!- PTM: THE G LYCOSAMINOGLYC AN CHAIN ATTACHED TO DECORIN CAN BE EITHER 
CC CHONDROITIN SULFATE OR DERMATAN SULFATE DEPENDING UPON THE 
CC TISSUE OF ORIGIN (BY SIMILARITY). 

CC •!- SIMILARITY: BELONGS TO THE SMALL INTERSTITIAL PROTEOGLYCANS 
CC FAMILY. 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 10. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; Z12298; G57550; -. 

•EMBL; X59859; G56057; -. 
PIR; S29145; S29145. 
PFAM; PF00560; LRR; 5. 

DR HSSP; P23945; 1XUN. 

KW GLYCOPROTEIN; CONNECTIVE TISSUE; EXTRACELLULAR MATRIX; PROTEOGLYCAN; 

KW REPEAT; LEUCINE-REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


16 


POTENTIAL. 


FT 


PROPEP 


17 


30 




FT 


CHAIN 


31 


354 


BONE PROTEOGLYCAN II. 


FT 


DOMAIN 


72 


303 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


72 


93 


LRR 1. 


FT 


REPEAT 


94 


117 


LRR 2. 


FT 


REPEAT 


118 


140 


LRR 3. 


FT 


REPEAT 


141 


162 


LRR 4. 


FT 


REPEAT 


163 


188 


LRR 5. 


FT 


REPEAT 


189 


212 


LRR 6. 


FT 


REPEAT 


213 


233 


LRR 7. 


FT 


REPEAT 


234 


257 


LRR 8. 


FT 


REPEAT 


258 


280 


LRR 9. 


FT 


REPEAT 


281 


303 


LRR 10. 


FT 


DISULFID 


49 


62 


BY SIMILARITY, 


FT 


DISULFID 


308 


341 


BY SIMILARITY , 


FT 


CARBOHYD 


34 


34 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


184 


184 


GLYCOSAMINOGLYCAN (BY SIMILARITY). 


FT 


CARBOHYD 


206 


206 


POTENTIAL, 


FT 


CARBOHYD 


241 


241 


POTENTIAL. 


FT 


CARBOHYD 


257 


257 


POTENTIAL. 


FT 


CARBOHYD 


298 


298 


POTENTIAL. 


so 


SEQUENCE 


354 AA; 


39805 MW 


513FB9C1 CRC32; 



Query Match 16.8%; Score 183; DB 1; Length 354; 

Best Local Similarity 34,4%; Pred. No, 2,53e-16; 

Matches 33; Conservative 23; Mismatches 37; Indels 3; Gaps 3; 

Db 39 YDPDNPLI-SMCPYRCQCHLRWQCSDLGLDKVPWEFPPDTTLLDLQNNKITEIK-EGAF 96 

I :|: :| hll :h! |:|: II: :| :| :| I I |:|: : :: : 
Qy 60 Y-ADSCFIDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNI 118 

Db 97 KNLKDLHTLILVNNKISKISPEAFKPLVKLERLYLS 132 

I :l I I II I: I: Hill |: 
Qy 119 HVLENLEXLDLSNNHITFINDKSFEKLSKLRELXLN 154 



ID GPBBJAPCY STANDARD; PRT; 208 AA. 
AC Q04785; 

DT 01-OCT-1993 (REL. 27, CREATED) 
DT 01-OCM993 (REL. 27, LAST SEQUENCE UPDATE) 
DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
DE PLATELET GLYCOPROTEIN IB BETA CHAIN PRECURSOR (fiP-IB BETA) | 
DE (CD42B-BETA) (CD42C). 
OS PAPIO CYNOCEPHALUS (YELLOW BABOON) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRA! A; MAMMALIA; EUTHERIA; 
OC PRIMATES; CATARRHINI; CERCOPITHECIDAE; CERCOPITHECINAE; PAPIO. 
RN [1] 

SEQUENCE FROM N.A. 
MEDLINE; 93273245. 

HAYZER D.J., SHOJI M, , KIM T.M., RUNGE M.S., HANSON S.R.; 
"Characterization of a cDNA encoding the beta-chain of baboon 
receptor glycoprotein GPIb."; 
GENE 127:271-272(1993). 

-I- FUNCTION: GP-IB, A SURFACE MEMBRANE PROTEIN OF PLATELETS, 

PARTICIPATES IN THE FORMATION OF PLATELET PLUGS BY BINDING TO VON 
WILLEBRAND FACTOR, WHICH IS ALREADY BOUND TO THE SUBENDOTHELIUM. 
-I- SUBUNIT: GP-IB ALPHA AND BETA ARE DISULFIDE LINKED, GP-IX IS 

COMPLEXED WITH THE GP-IB HETERODIMER VIA A NON COVALENT LINKAGE. 
-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
- ! - PLATELET ACTIVATION APPARENTLY INVOLVES DISRUPTION OF THE 

MACROMOLECULAR COMPLEX OF GP-IB WITH THE PLATELET GLYCOPROTEIN IX 
(GP-IX) AND DISSOCIATION OF GP-IB FROM THE ACT IN- BINDING PROTEIN. 
-I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS. NUMBER IN THIS PROTEIN: 1. 



RX 
RA 
RT 
RT 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

DR EMBL; L05927; G176582; - 
DR PFAM; PF00560; LRR; 1, 
KW PLATELET; TRANSMEMBRANE; 
KW SIGNAL; PHOSPHORYLATION; 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseSisb-sib.ch). 



FT SIGNAL 

FT CHAIN 

FT DOMAIN 

FT TRANSMEM 

FT DOMAIN 

FT CARBOHYD 

FT MOD RES 
FT 

SQ SEQUENCE 



GLYCOPROTEIN; HEMOSTASIS; BLOOD COAGULATION; 
CELL ADHESION; LEUCINE-REPEAT. 
1 26 BY SIMILARITY, 
27 208 PLATELET GLYCOPROTEIN IB BETA CHAIN. 
27 147 ' EXTRACELLULAR (POTENTIAL). 
148 172 BY SIMILARITY. 
173 208 CYTOPLASMIC (POTENTIAL). 
66 66 POTENTIAL. 
193 193 PHOSPHORYLATION (BY CAPK) 

(BY SIMILARITY). 
208 AA; 21984 MW; 067DDEA5 CRC32; 



Query Match 14.8*; Score 162; DB 1; Length 208; 

Best Local Similarity 38.5%; Pred. No. 2.24e-12; 

Matches 25; Conservative 18; Mismatches 19; Indels 3; Gaps 2; 

Db 26 CPAPCSCAGTLVDCGRRGLTWASLPTSFPVHTTELVLTGNNLTALP-SGLLDALPAVRTA 84 

II.: I I II 111:111 ::lll:l : 1 : 1 : 1 : 1 1 1 : : : : :: : I : 
Qy ' 70 CPTQCDCYGTTVDCNKRGLN-TIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXL 127 
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85 HLGAN 89 

I: I 
128 DLSNN 132 



RESULT 14 

ID GPBBJUMAK STANDARD; PRT; 206 AA, 

AC P13224; 

DT 01-JAN-1990 (REL, 13, CREATED) 

DT 0WAN-1990 (REL. 13, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL, 35, LAST ANNOTATION UPDATE) 

DE PLATELET GLYCOPROTEIN IB BETA CHAIN PRECURSOR (GP-IB BETA) (GPIBB) 

DE (CD42B-BETA) (CD42C). 

GN GPIBB. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N,A, 

RX MEDLINE; 88176901. 

•LOPEZ J.A., CHUNG D.W., FUJIKAWA K., HAGEN F.S., DAVIE E.W., 
ROTH G.J.; 
"The alpha and beta chains of human platelet glycoprotein lb are both 

RT transmembrane proteins containing a leucine-rich amino acid 

RT sequence."; 

RL PROC. NATL. ACAD. SCI. U.S.A. 85:2135-2139(1988). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BRAIN; 

RX MEDLINE; 94292494. 

RA YAGI M., EDELHOFF S., DISTECHE CM., ROTH 6.J.; 

RT "Structural characterization and chromosomal location of the gene 

RT encoding human platelet glycoprotein lb beta/; 

RL J, BIOL. CHEM. 269:17424-17427(1994) . 

RN [3] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97174353, 

RA ZIEGER B., HASHIMOTO Y. ( WARE J.; 

RT "Alternative expression of platelet glycoprotein Ib(beta) ihrna from an 

RT adjacent 5' gene with an imperfect polyadenylation signal sequence."; 

RL J. CLIN. INVEST. 99:520-525(1997). 

RN [4] 

RP PHOSPHORYLATION, AND SEQUENCE OF 186-200. 

RX MEDLINE; 89359414, 

RA WARDELL M.R., REYNOLDS C.C., BERNDT M.C., WALLACE R,W,, FOX J.E.B.; 

RT "Platelet glycoprotein lb beta is phosphorylated on serine 166 by 

RT cyclic AMP-dependent protein kinase."; 

RL J. BIOL. CHEM, 264:15656-15661(1989). 

• [5] 
SEQUENCE OF 27-40, 
MEDLINE; 87326405. 

RA CANFIELD V.A., OZOLS J., NUGENT D., ROTH G,J,; 

RT. "Isolation and characterization of the alpha and beta chains of human 

RT platelet glycoprotein lb."; 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 147:526-534(1987). 

CC -!■ FUNCTION: GP-IB, A SURFACE MEMBRANE PROTEIN OF PLATELETS, 

CC PARTICIPATES IN THE FORMATION OF PLATELET PLUGS BY BINDING TO VON 

CC WILLEBRAND FACTOR, WHICH IS ALREADY BOUND TO THE SUBENDOTHELIUM. 

CC -I- SUBUNIT; GP-IB ALPHA AND BETA ARE DISULFIDE LINKED. GP-IX IS 

CC COMPLEXED WITH THE GP-IB HETERODIMER VIA A NON COVALENT LINKAGE. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC - ! - PLATELET ACTIVATION APPARENTLY INVOLVES DISRUPTION OF THE 

CC MACROMOLECULAR COMPLEX OF GP-IB WITH THE PLATELET GLYCOPROTEIN IX 

CC (GP-IX) AND DISSOCIATION OF GP-IB FROM THE ACTIN-BINDING PROTEIN. 

CC •!• SIMILARITY; THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS, NUMBER IN THIS PROTEIN: 1. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 



CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license<2isb-sib.ch). 

CC' 

DR EMBL; J03259; G306792; -. 

DR EMBL; AF006988; G2978512; -. 

DR EMBL; U59632; G1809266; -. 

DR PIR; B26864; B26864. 

DR PIR; A31929; A31929. 

DR MIM; 138720; -. 

DR PFAM; PF00560; LRR; 1. 

KW PLATELET; TRANSMEMBRANE; GLYCOPROTEIN; HEMOSTASIS; BLOOD COAGULATION; 

KW SIGNAL; PHOSPHORYLATION; CELL ADHESION; LEUCINE-REPEAT. 

FT SIGNAL 1 26 

FT CHAIN 27 206 PLATELET GLYCOPROTEIN IB BETA CHAIN. 

FT DOMAIN 27 147 EXTRACELLULAR (POTENTIAL). 

FT TRANSMEM 148 172 POTENTIAL. 

FT DOMAIN 173 206 CYTOPLASMIC (POTENTIAL). 

FT REPEAT 60 83 LRR. 

FT CARBOHYD 66 66 POTENTIAL. 

FT MODJES 191 191 PHOSPHORYLATION (BY CAPK), 

SQ SEQUENCE 206 AA; 21717 MW; A9591D41 CRC32; 

Query Match 14.61;. Score 159; DB 1; Length 206; 

Best Local Similarity 38.5%; Pred. No. 7,95e-12; 

Matches 25; Conservative 17; Mismatches 20; Indels 3; Gaps 2; 

Db 26 CPAPCSCAGTLVDCGRRGLTWASLPTAFPVDTTELVLTGNNLTALP-PGLLDALPALRTA 84 

II: I I II III :IH ::||::| : 1 : 1 : 1 : 1 1 1 : : : : : : II 
Qy 70 CPTQCDCYGTTVDCNKRGLN-TIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXL 127 

Db 85 HLGAN 89 
I: I 

Qy 128 DLSNN 132 



RESULT 15 

ID GPBBJOUSE STANDARD; PRT; 206 AA. 

AC P56400; 

DT 15-JUL-1998 (REL. 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE PLATELET GLYCOPROTEIN IB BETA CHAIN PRECURSOR (GP-IB BETA) (GPIBB). 

GN GPIBB, 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 97403789. 

RA KITAGUCHI T., MURATA M. , ANBO H., MORIKI T., IKEDA Y.; 

RT "Characterization of the gene encoding mouse platelet glycoprotien lb 

RT beta,"; 

RL THROMB. RES. 87:235-244(1997). 

CC -!- FUNCTION: GP-IB, A SURFACE MEMBRANE PROTEIN OF PLATELETS, 

CC PARTICIPATES IN THE FORMATION OF PLATELET PLUGS BY BINDING TO VON 

CC WILLEBRAND FACTOR, WHICH IS ALREADY BOUND TO THE SUBENDOTHELIUM 

CC (BY SIMILARITY). 

CC -I- SUBUNIT: GP-IB ALPHA AND BETA ARE DISULFIDE LINKED, GP-IX IS 

CC COMPLEXED WITH THE GP-IB HETERODIMER VIA A NON COVALENT LINKAGE 

CC (BY SIMILARITY). 

CC -!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 

CC -!• PLATELET ACTIVATION APPARENTLY INVOLVES DISRUPTION OF THE 

CC MACROMOLECULAR COMPLEX OF GP-IB WITH THE PLATELET GLYCOPROTEIN IX 

CC (GP-IX) AND DISSOCIATION OF GP-IB FROM THE ACTIN-BINDING PROTEIN. 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 1. 

CC ; 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
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CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; AB001419; D1023289; -. 

DR MGD; MGI:107852; GP1BB. 

KW PLATELET; TRANSMEMBRANE; GLYCOPROTEIN; HEMOSTASIS; BLOOD COAGULATION; 

KW SIGNAL; PHOSPHORYLATION; CELL ADHESION; LEUCINE-REPEAT. 



FT 


SIGNAL 


1 


26 


BY SIMILARITY. 


FT 


CHAIN 


27 


206 


PLATELET GLYCOPROTEIN IB BETA CHAIN. 


FT 


DOMAIN 


27 


,147 


EXTRACELLULAR (POTENTIAL) , 


FT 


TRANSMEM 


148 


172 


POTENTIAL. 


FT 


DOMAIN 


173 


206 


CYTOPLASMIC (POTENTIAL). 


FT 


REPEAT 


60 


83 


LRR. 


FT 


CARBOHYD 


66 


66 


POTENTIAL. 


FT 


MOD.RES 


191 


191 


PHOSPHORYLATION (BY CAPK) (BY 


FT 








SIMILARITY) . 


SQ 


SEQUENCE 


206 AA; 


21762 MW; 4D3362F4 CRC32; 


1 


uery Match 




14,6*; 


Score 159; DB 1; Length 206; 



Pbest Local Similarity 36.9%; Pred. No. 7.95e-12; 
Matches 24; Conservative 18; Mismatches 20; Indels 3; Gaps 2; 

Db 26 CPAPCSCAGTLVDCGRRGLTWASLPAAFPPDTTELVLTGNNLTALP-PGLLDALPALRAA 84 

II: I I II III :||| ::|:::| : 1 : 1 : 1 : 1 1 1 : : : : : : I I 
Qy 70 CPTQCDCYGTTVDCNKRGLN-TIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXL 127 

Db 85 HLGAN 89 

I: I 

Qy 128 DLSNN 132 



Search completed: Fri May 28 09:05:09 1999 
Job time : 27 sees. 
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Release 3.1A John F. Collins, Biocoraputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^srch_pp protein ■ protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:05:28 1999; MasPar time 12.78 Seconds 

662.015 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description : 
Perfect Score: 



Scoring table: 

Searched: 
Post-] 



MJS-09-191-647-8 

(1-155) from US09191647 .pep 

1092 

1 RNPX ICDCNLQWLAQI NLQK FINDKSFEKLSKLRELXLND 155 

PAM 150 
Gap 11 

179066 seqs, 54579741 residues 

: Minimum Match 0% 
Listing first 45 summaries 

sptrembl9 

l:sp_archea 2:spjbacteria 3:sp_fungi 4:sp_human 
5:sp_invertebrate 6:spjnammal 7:sp_mhc 8:sp_organelle 
9:sp_phage 10:sp_plant ll:sp_rodent 12:sp_unclassif led 
13:sp_vertebrate 14:sp_virus 

Mean 42.369; Variance 75.870; scale 0.558 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



Query 



Score 


Match Length 


DB 


ID 


Description 


Pred. No. 


497 


45.5 


1523 


11 


088280 


MEGF5. 


1.49e 


82 


395 


36.2 


1531 


11 


088279 


MEGF4, 


4.57e 


60 


213 


19,5 


167 


5 


Q21042 


F59F3.7. 


5.83e 


22 


202 


18.5 


224 


5 


044086 


ZK994.4 PROTEIN. 


8.27e 


20 


201 


18,4 


420 


4 


013641 


5T4 ONCOFETAL ANTIGEN 


1.29e 


19 


197 


18.0 


811 


4 


075139 


KIAA0644 PROTEIN. 


7.69e 


19 


195 


17.9 


360 


6 


046542 


DERMATAN SULFATE PROTE 


1.87e 


18 


195 


17.9 


360 


6 


023833 


DECORIN . 


1.87e 


18 


193 


17.7 


369 


6 


046390 


BIGLYCAN PRECURSOR. 


4.53e 


18 


192 


17.6 


372 


6 


046403 


BIGLYCAN. 


7.05e 


18 


191 


17.5 


321 


6 


P79119 


EPIPHYCAN. 


l.lOe 


17 


191 


17.5 


322 


4 


099645 


DERMATAN SULFATE PROTE 


l.lOe 


17 


184 


16.8 


322 


11 


P70186 


PROTEOGLYCAN PRECURSOR 


2.37e 


16 


182 


16.7 


316 


13 


Q90944 


PROTEOGLYCAN CORE PR0T 


5.67e 


16 


182 


16.7 


1385 


5 


Q26388 


TLR-TOLL-LIKE RECEPTOR 


5.67e 


16 


182 


16.7 


1389 


5 


Q24591 


WHEELER. 


5.67e 


16 


181 


16.6 


718 


13 


073675 


NEURONAL LEUCINE-RICH 


8.76e 


16 


178 


16.3 


522 


4 


043354 


BAC CLONE GS099H08, CO 


3.22e 


15 


173 


15.8 


1496 


4 


Q92626 


MYELOBLAST RIAA0230 (F 


2.78e 


14 


169 


15.5 


705 


4 


043377 


BAC CLONE RG118D07 FRO 


1.54e 


13 



21 


168 


15.4 


733 5 


Q24250 


TARTAN PROTEIN PRECURS 


2.35e-13 


22 


167 


15.3 


516 4 


043300 


KIAA0416. 


3,60e-13 


23 


167 


15.3 


707 11 


P97860 


LEUCINE-RICH REPEAT PR 


3.60e-13 


24 


166 


15,2 


713 4 


075325 


GLIOMA AMPLIFIED ON CH 


5,50e-13 


25 


164 


15.0 


96 11 


Q63156 


DECORIN (FRAGMENT), 


L.28e-12 


26 


163 


14.9 


423 11 


035103 


0STE0MODULIN. 


L.95e-12 


27 


161 


14.7 


421 4 


Q99983 


OSTE0MODULIN, COMPLETE 


4.53e-12 


28 


161 


14.7 


1091 11 


P70193 


MEMBRANE GLYCOPROTEIN. 


4.53e-12 


29 


159 


14.6 


411 4 


Q14422 


GLYCOPROTEIN IB BETA. 


L.05e-ll 


30 


159 


14.6 


907 4 


075473 


ORPHAN G PROTEIN-COUPL 


L.05e-ll 


31 


157 


14.4 


716 11 


061809 


LEUCINE -RICH -REPEAT PR 


2.41e-ll 


32 


156 


14.3 


1535 5 


Q23991 


PEROXIDASE PRECURSOR. 


3,65e-ll 


33 


151 


13.8 


892 5 


P91644 


KEK2 PRECURSOR (FRAGME 


2.88e-10 


34 


150 


13.7 


739 4 


075094 


MEGF5 (FRAGMENT). 


4.34e-10 


35 


147 


13.5 


428 4 


014498 


ISLR PRECURSOR. 


1.48e-09 


36 


144 


13.2 


661 11 


062192 


LYMPHOCYTE ANTIGEN 78 


4.99e-09 


37 


143 


13.1 


683 5 


Q22187 


T05A1.3 PROTEIN. 


7.46e-09 


38 


142 


13.0 


526 10 


022753 


PREDICTED LEUCINE-RICH 


1.12e-08 


39 


142 


13.0 


880 5 


P91643 


KEK1 PRECURSOR. 


1.12e-08 


40 


139 


12.7 


135 6 


046377 


BIGLYCAN (FRAGMENT), 


3.70e-08 


41 


139 


12.7 


230 6 


029396 


LEUCINE-RICH GLYCOPROT 


3.70e-08 


42 


138 


12.6 


4293 11 


008852 


POLYCYSTIC KIDNEY DISE 


5,51e-08 


43 


136 


12.5 


358 11 


055226 


CHONDROADHERIN, 


1.22e-07 


44 


136 


12.5 


422 6 


077742 


OSTEOADHERIN. 


1.22e-07 


45 


136 


12,5 


661 4 


099467 


LYMPHOCYTE ANTIGEN 64 


l,22e-07 



RESULT 
ID 08£ 
AC 
DT 



PRELIMINARY; 



PRT; 1523 AA. 



)8, CREATED) 

)8, LAST. SEQUENCE UPDATE) 
18, LAST ANNOTATION UPDATE) 



01-NOV-1998 (TREMBLREL, 
01-NOV-1998 (TREMBLREL, 
01-NOV-1998 (TREMBLREL, 
MEGF5, 
MEGF5. 

RATTUS NORVEGICUS (RAT). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

[1] 

SEQUENCE FROM N.A. 

STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N., SERI N. , OHARA O.; 

"Identification of high-molecular-weight proteins with multiple 

EGF-like motifs by motif-trap screening."; 

GENOMICS 51:27-34(1998). 

EMBL; AB011531; D1033424; -. 

PROSITE; PS01185; CTCK 1; 1. 

PROSITE; PS01186; EGF 2; 7. 

PROSITE; PS01187; EGF.CA; 2. 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1523 AA; 167767 MW; 2BD845D0 CRC32; 



Query Match 45.5%; Score 497; DB 11; Length 1523; 

Best Local Similarity 40.91; Pred. No. l,49e-82; 

Matches 63; Conservative 45; Mismatches 44; Indels 2; Gaps 2; 

Db 436 QNPFVCDCHLKWLAD-YLQDNPIETSGARCSSPRRLANKRISQIKSKKFRCSGSEDYRNR 494 

:M :IH:I III: II I llllllll |:|| :|::: : ::||:| III : : 
Qy 1 RNPXICDCNLQWLAQINLQKN-IETSGARCEQPKRLRKKKFATLPPNKFKCKGSESFVSM 59 

Db 495 FSSECFMDLVCPEKCRCEGTIVDCSNQKLSRIPSHLPEYTTDLRLNDNDIAVLEATGIFK 554 

:: MM Ml I I II III::: |: ||: :| ::|:| |: |:|: :: : : 
Qy 60 YADSCFIDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIH 119 

Db 555 KLPNLRKINLSNNRIKEVREGAFDGAAGVQELML 588 

I II ::||||:| : : :|: : ::|| I 
Qy 120 VLENLEXLDLSNNHITFINDKSFEKLSKLRELXL 153 
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ID 088279 PRELIMINARY; PRT; 1531 AA, 

AC 088279; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE ) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION OPDATE) 

DE MEGF4 . 

GN MEGF4. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1) 

RP SEQUENCE FROM N, A. ' 

RC STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N. , SEKI N., OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 
RT EGF-like motifs by motif-trap screening."; 
RL GENOMICS 51:27-34(1998). 
DR EMBL; AB011530; D1033423; -. 
DR PROSITE; PS01185; CTCK 1; 1, 

• PROSITE; PS01186; EGFJ; 8. 
PROSITE; PS01187; EGF CA; 2. 
GLYCOPROTEIN; EGF-LIKE DOMAIN. 
SQ SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

Query Match 36.2*; Score 395; DB 11; Length 1531; 

Best Local Similarity 39,4*; Pred. No. 4,57e-60; 

Matches 63; Conservative 41; Mismatches 48; Indels 8; Gaps 7; 

Db 438 QNPF ICDC NLKWLADF - LRTNPIETTG ARCASPRRLANKRIGQ IKSKKFRCSAKEQYP I P 496 

:l! Illlll Mh: I: I llhllll hi :|::: : ::||:| : | ]:: 
Qy 1 RNPXICDCNLQWLAQINLQKN-IETSGARCEQPKRLRKKKFATLPPNKFKCKGSES-FVS 58 

Db 497 GTEDYHLNSECTSDVACPHKCRCEASWECSGLKLSKIPERIPQSTTELRLNNNEISILE 556 

1:1111111:: hi: |: II II: :|:| |: hi :: 
Qy 59 -M - - YA - DS -C F IDS ICPTQCDC YGTTVDCNKRGLNT IPT S I PRFATQLLLSGNNI STVD 113 

Db 557 ATGLFKKLSHLKKINLSNNKVSEIEDGTFEGATSVSELHL 596 

: : I :| ::IMI |:| :|| : ; II I 
Qy 114 LNSNIHVLENLEXLDLSNNHITFINDKSFEKLSRLRELXL 153 



RESULT 3 

ID Q21042 PRELIMINARY; PRT; 167 AA. 

AC Q21042; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

JE F59F3.7, 

A CAENORHABD IT I S ELEGANS . 

Wf EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 
RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RA KERSHAW J.; 

RL SUBMITTED (NOV-1995) TO EMBL/G ENBANK/DDB J DATA BANKS. 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K. , BAYNES C, BERKS M., 

RA BONFIELD J., BURTON J., CONNELL M., COPSEY T., COOPER J., COOLSON A. , 

RA CRAXTON M. , DEAR S,, DU Z., DURBIN R. , FAVELLO A,, FULTON L. , 

RA GARDNER A., GREEN P., HAWKINS T., HILLIER L. ( JIER M., JOHNSTON L,, 

RA JONES M,, KERSHAW J., KIRSTEN J. ( LAISTER N. ( LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A,, MORTIMORE B,, O'CALLAGHAN M. , 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R., 

RA SMALDON N. , SMITH A, , SONNHAMMER E., STADEN R. , SULSTON J., 

RA THIERRY-MIEG J., THOMAS K. , VAUDIN M. , VADGHAN K. ( WATERSTON R., 

RA WATSON A., WEINSTOCK L., WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans . " ; 

RL NATURE 368:32-38(1994). 

DR EMBL; Z68005; G1070069; -. 



DR PFAM; PF00560; LRR; 2. 

SQ SEQUENCE 167 AA; 18567 MW; 4F0F09BF CRC32; 

Query Match 19.5%; Score 213; DB 5; Length 167; 

Best Local Similarity 36.0%; Pred, No. 5.83e-22; 

Matches 32; Conservative 25; Mismatches 31; Indels 1; Gaps 1; 

Db 13 LSTACPSECRCAGLDVHCEGKNLTAIPGHIPIATTNLYFSNNLLNSLS-KSNFQALPNLQ 71 

: : h: II I |: : I :|| I :| I :| I :::: :||:: | h 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 72 YLDLSNNSIRDIEETLLDSFPGLKYLDLS 100 

Illlll I I:: :: :: h I h 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 4 

ID 044086 PRELIMINARY; PRT; 224 AA. 

AC 044086; 

DT 01-JUN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE ZK994.4 PROTEIN. 

GN ZK994.4. 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1) 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R, ( ANDERSON K,, BAYNES C, BERKS M., 

RA BONFIELD J., BURTON J. ( CONNELL M., COPSEY T , , COOPER J., COULSON A., 

RA CRAXTON M, , DEAR S,, DU Z,, DURBIN R., FAVELLO A. ( FULTON L,, 

RA GARDNER A., GREEN P, ( HAWKINS T., HILLIER L,, JIER M,, JOHNSTON L,, 

RA JONES M., KERSHAW J,, KIRSTEN J., LAISTER N., LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B,, O'CALLAGHAN M. , 

RA PARSONS J., PERCY C, RIFKEN l. ( ROOPRA A., SAUNDERS D, ( SHOWNKEEN R., 

RA SMALDON N. , SMITH A., SONNHAMMER E. , STADEN R., SULSTON J., 

RA THIERRY-MIEG J., THOMAS K. , VAUDIN M., VAUGHAN K., WATERSTON R, , 

RA WATSON A., WEINSTOCK L., WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C, 

RT elegans."; 

RL NATURE 368:32-38(1994), 

RN [2] 

RP SEQUENCE FROM N,A, 

RC STRAIN-BRISTOL N2; 

RA DAVIDSON S., WOHLDMANN P.; 

RL SUBMITTED (DEC'1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [3] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RA WATERSTON R.; 

RL SUBMITTED (SEP-1997) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; AF022977; G2668616; -. 

SQ SEQUENCE 224 AA; 25885 MW; 3720F7F5 CRC32; 

Query Match i 18.5%; Score 202; DB 5; Length 224; 

Best Local Similarity 32.6%; Pred, No. 8.27e-20; 

Matches 30; Conservative 24; Mismatches 34; Indels 4; Gaps 3; 

Db 17 FVAGLECPVECTCDKKGLWDCSSSGLTRIPKNISRNVRSLVIRNNRIHKLK-RSDLEGF 75 

I: :: II :| I I III: II II :|:| |:: I I : |:: : 
Qy 65 FIDSI-CPTQCDCY--GTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVL 121 

Db 76 NQLETLVLTHNKIKI IEENVLDHLPELKRLSL 107 

: II I l::l I :|::: :: |: |: I I 

Qy 122 ENLEXLDLSNNHITFINDKSFEKLSKLRELXL 153 



RESULT 5 

ID Q13641 PRELIMINARY; PRT; 420 AA, 
AC Q13641; 
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Ol-NOV-1996 (TREMBLREL. 01, CREATED) 

01 -NOV- 1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

5T4 ONCOFETAL ANTIGEN PRECURSOR. 

HOMO SAPIENS (HUMAN) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

CATARRHINI; HOMINIDAE; HOMO. 

[1] 

SEQUENCE FROM N.A. 
TISSUE-PLACENTA; 
MEDLINE; 94179356. 

MYERS K.A., RAHI-SAUND V., DAVISON M.D., YOUNG J.A., CHEATER A.J., 
STERN P.L.; 

"Isolation of a cDNA encoding 5T4 oncofetal trophoblast glycoprotein, 
An antigen associated with metastasis contains leucine-rich 



RT repeats . " 



i 



J. BIOL. CHEM. 269:9319-9324(1994), 
EMBL; Z29083; G435655; -. 
PFAM; PF00560; LRR; 4. 
SIGNAL. 

SIGNAL 1 31 

CHAIN 32 420 



POTENTIAL. 

5T4 ONCOFETAL ANTIGEN, 



SEQUENCE 420 AA; 46031 MW; 43633112 CRC32; 



Query Match 18.4%; Score 201; DB 4; Length 420; 

Best Local Similarity 31.5%; Pred, No. 1.29e-19; 

Matches 28; Conservative 25; Mismatches 32; Indels 4; 



4; 



Db 



62 CPALCECSEAARTVKCVNRNLTEVPTDLPAYVRNLFLTGNQLAVLPAGAFARRPPLAELA 121 
II: hi - II I :l I :ll =1 : 1 : 1 : 1 1 :: : : : I :| 
Qy 70 CPTQCDCY-GT-TVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNS-NIHV-LENLE 125 

Db 122 ALNLSGSRLDEVRAGAFEHLPSLRQLDLS 150 

1:11 -: : Ml I: l|:| |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



AC 
DT 
DT 
DT 



PRT; 811 AA. 
, CREATED) 

, LAST SEQUENCE UPDATE) 
, LAST ANNOTATION UPDATE) 



RESULT 6 

ID 075139 PRELIMINARY; 
075139; 

01-NOV-1998 (TREMBLREL. ( 
01-NOV-1998 (TREMBLREL. ( 
01-NOV-1998 (TREMBLREL. ( 
DE KIAA0644 PROTEIN. 

«KIAA0644, 
HOMO SAPIENS (HUMAN) . 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 
CATARRHINI; HOMINIDAE; HOMO. 
RN [1] 

RP SEQUENCE FROM N.A, 

RC TISSUE-BRAIN; 

RX MEDLINE; 98403880. 

RA ISHIKAWA K., NAGASE T,, SUYAMA M., MIYAJIMA N., TANAKA A., KOTANI H. ( 
RA NOMURA N. , OHARA O.; 

RT "Prediction of the coding sequences of unidentified human genes. X. 
RT The complete sequences of 100 new cDNA clones from brain which can 
OT ' code for large proteins in vitro."; 
DNA RES. 5:169-176(1998). 
EMBL; AB014544; D1032580; -, 

811 AA; 88695 MW; C8B8C147 CRC32; 



RL 
DR 



Query Match 18,0%; 
Best Local Similarity 38.3%; 
Matches 36; Conservative 



Score 197; DB 4; Length 811; 

Pred. No. 7.69e-19; 

19; Mismatches 33; Indels 



5; 



Db 



24 EPVCPERCDCQHPQHLLCTNRGLRWPKTSSLPSPHDVLTYSLGGNFITNIT-AFDFHRL 82 
:::H Mil : I :IM :| II :| I 1 : 1 1 h : ::| I 
Qy 67 DSICPTQCDC-YGTTVDCNKRGLNTIP-TS-IPRFATQLL--LSGNNISTVDLNSNIHVL 121 

Db 83 GQLRRLDLQYNQIRSLHPKTFEKLSRLEELYLGN 116 

I III 1:1 :: hllllhl II I : 
Qy 122 ENLEXLDLSNNHITFINDKSFEKLSKLRELXLND 155 



ID 046542 PRELIMINARY; PRT; 360 AA. 

AC 046542; 

DT 01-JUN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE DERMATAN SULFATE PROTEOGLYCAN II. 

OS EQUUS CABALLUS (HORSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PERISSODACTYLA; EQUIDAE; EQUUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RA RICHARDSON D.W., DODGE G.R.; 

RL SUBMITTED (DEC'1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AF038127; G2723947; -. 

360 AA; 39939 MW; A3C42C76 CRC32; 



Query Match 17.9%; 
Best Local Similarity 33.7%; 
Matches 30; Conservative 



Score 195; DB 6; Length 360; 
Pred. No, 1.87e-18; 

22; Mismatches 36; Indels 1; Gaps 1; 



Db 



51 LGPVCPFRCQCHLRWQCSDLGLDKVPKDLPPDTTLLDLQNNKITEIK-DGDFKNLKNLH 109 
: ::M :hl l:|: II: :| :| :| I I |:|: : :::: | || 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 110 ALILVNNKISKISPGAFTPLVKLERLYLS 138 

I I II I: I: :| I II I I: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 

id q; 

AC 
DT 
DT 
DT 



Q28888 PRELIMINARY; PRT; 360 AA. 
Q28888; Q28608; 

Ol-NOV-1996 (TREMBLREL. 01, CREATED) 
Ol-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
DECORIN. 

ORYCTOLAGUS CUNICULUS (RABBIT) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

LAGOMORPHA; LEPORIDAE; ORYCTOLAGUS. 

[1] 

SEQUENCE FROM N.A. 
MEDLINE; 95122319, 
ZHAN Q., BURROWS R., CINTRON C; 

"Cloning and in situ hybridization of rabbit decorin in corneal 
tissues,"; 

INVEST, OPHTHALMOL. VIS. SCI. 36:206-215(1995). 
[2] 

SEQUENCE OF 38-358 FROM N.A. 

TISSUE-CARTILAGE; 

HERING T.M., KOLLAR J.; 

SUBMITTED (NOV-1993) TO EMBL/GENBANK/DDBJ DATA BANKS, 
EMBL; S76584; G913375; -. 
EMBL; U03394; G415647; -. 
PFAM; PF00560; LRR; 5. 

360 AA; 39896 MW; . 1FAB2F8E CRC32; 



Query Match 17.9%; Score 195; DB 6; Length 360; 

Best Local Similarity 33.7%; Pred, No. 1.87e-18; 

Matches 30; Conservative 22; Mismatches 36; Indels 1; Gaps 1; 

Db 51 LGPVCPFRCQCHLRWQCSDLGLDKVPKDLPPDTTLLDLQNNKITEIK-DGDFKNLKNLH 109 

: ::|| :|:| |:|: l|: :| :| :| I I |:|: : :::: | || 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 110 ALILVNNKISKISPGAFTPLVKLERLYLS 138 

I I II I: I : :| I II I |: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



ID 046390 
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AC 046390; 

DT 0WDN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JON-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE BIGLYCAN PRECURSOR. 

OS OVIS ARIES (SHEEP) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; CAPRINAE; OVIS. 

RN [1] 

RP SEQUENCE FROM N,A. 

RC TISSUE-CHOROID PLEXUS; 

RA BRUETT L. , CLEMENTS J.E.; 

RL SUBMITTED (NOV-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AF034842; G2655356; -. 

KW SIGNAL. 

FT SIGNAL 1 18 POTENTIAL. 

FT CHAIN 38 369 BIGLYCAN. 

SQ SEQUENCE 369 AA; 41523 MW; A0A9F549 CRC32; 

Query Match 17.7%; Score 193; DB 6; Length 369; 

fest Local Similarity 34.5*; Pred. No. 4.53e-18; 
fetches 30; Conservative 23; Mismatches 33; Indels 1; Gaps 1; 
62 AMCPFGCHCHLRWQCSDLGLKAVPKEISPDTTLLDLQNNDISELR-KDDFRGLQHLYAL 120 
::M I I 1:1: ll = ::| I: :! I I |:ll : : :: l"l I 
Qy 68 SICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXL 127 

Db 121 VLVNNK ISK I HEKAFS PLRKLQKLY I S 147 

I II I: I::hl I l|: I :: 
Qy 128 DLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 10 

ID 046403 PRELIMINARY; PRT; 372 AA. 

AC 046403; 

DT OWUN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE BIGLYCAN. 

OS EQUUS CABALLUS (HORSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PERISSODACTYLA; EQUIDAE; EQUUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA RICHARDSON D.W., DODGE G.R.; 

RL SUBMITTED (NOV-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AF035934; 62662531; -. 

SQ SEQUENCE 372 AA; 41924 MW; 097E7BA6 CRC32; 

tery Match 17.6%; Score 192; DB 6; Length 372; 

st Local Similarity 33.7%; Pred. No, 7,05e-18; 
tches 30; Conservative 24; Mismatches 34; Indels 1; Gaps 1; 

Db 63 FSAMCPFGCHCHLRWQCSDLGLKAVPKEISPDTTLLDLQNNEISELR-KDDFKGLQHLY 121 

: I I hh ll:::l h :l I I 1 = 11 : : " |::| 
Qy 66 IDSICPTQCDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLE 125 

Db 122 ALVLVNNKISKIHEKAFSPLRKLQKLYIS 150 

I I II I: I::|:| I II: I :: 
Qy 126 XLDLSNNHITFINDKSFEKLSKLRELXLN 154 



RESULT 11 

ID P79119 PRELIMINARY; PRT; 321 AA. 

AC P79119; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE EPIPHYCAN. 

OS BOS TAURUS (BOVINE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; BOVINAE; BOS. 

RN [1] 



RP SEQUENCE FROM N.A, 

RX MEDLINE; 97373567. 

RA JOHNSON H.J., ROSENBERG L., CHOI H.U., GARZA S., HOOK M., NEAME P.J.; 

RT "Characterization of epiphycan, a small proteoglycan with a 

RT leucine-rich repeat core protein,"; 

RL J. BIOL. CHEM. 272:18709-18717(1997). 

DR EMBL; U77127; G1679745; -. 

DR PFAM; PF00560; LRR; 3. 

SQ SEQUENCE 321 AA; 36687 MW; D1B0F07E CRC32; 

Query Match 17.5%; Score 191; DB 6; Length 321; 

Best Local Similarity 34.1%; Pred. No. 1.10e-17; 

Matches 28; Conservative 27; Mismatches 25; Indels 2; Gaps 2; 

Db 120 CTCISTTVYCDDHELDAIP-PLPKNTAYFYSRFNRIKKIN-KNDFASLNDLRRIDLTSNL 177 

I I :||| I: : |::|| ::|: :: : II:: :::: |::| :||::| 
Qy 74 CDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXLDLSNNH 133 

Db 178 ISEIDEDAFRKLPQLRELVLRD 199 

I: I:: :| II: Mil I I 
Qy 134 ITFINDKSFEKLSKLRELXLND 155 



RESULT 12 

ID Q99645 PRELIMINARY; PRT; 322 AA. 

AC Q99645; ' 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT Ql-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE DERMATAN SULFATE PROTEOGLYCAN 3, 

GN DSPG3. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-CHONDROCYTE; 

RX MEDLINE; 97131519. 

RA DEERE M. , JOHNSON J,, GARZA S., HARRISON W.R., YOON S.J., 

RA ELDER F.F.B., KUCHERLAPATI R. , HOOK M., HECHT J.T.; 

RT "Characterization of human DSPG3, a small dermatan sulfate 

RT proteoglycan."; 

RL GENOMICS 38:399-404(1996). 

DR EMBL; U59111; G1794209; -. 

DR PFAM; PF00560; LRR; 3. 

SQ SEQUENCE 322 AA; 36607 MW; DCB533EB CRC32; 

Query Match 17.5%; Score 191; DB 4; Length 322; 

Best Local Similarity 34.1%; Pred, No. 1.10e-17; 

Matches 28; Conservative 26; Mismatches 26; Indels 2; Gaps 2; 

Db 121 CTCISTTVYCDDHELDAIP-PLPKNTAYFYSRFNRIKKIN-KNDFASLSDLKRIDLTSNL 178 

I I :||l I: : |::|| ::|: :: : I I :::: I :| :||::| 
Qy 74 CDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSGNNISTVDLNSNIHVLENLEXLDLSNNH 133 

Db 179 ISEIDEDAFRKLPQLRELVLRD 200 

I: I:: :l II: MM I I 
Qy 134 ITFINDKSFEKLSKLRELXLND 155 



RESULT 13 

ID P70186 PRELIMINARY; PRT; 322 AA, 

AC P70186; 

DT 01-FEB-1997 (TREMBLREL, 02, CREATED) 

DT 01-FEB-1997 (TREMBLREL. 02, LAST SEQUENCE UPDATE) 

DT 0WAN-1999 (TREMBLREL. 09, LAST ANNOTATION UPDATE) 

DE PROTEOGLYCAN PRECURSOR. 

GN PG-LB. 

OS MUS MUSCULUS (MOUSE) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 
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RC STRAIN-BALB/C; TISSUE=EPIPHYSEAL CARTILAGE; 

RX MEDLINE; 96433109. 

RA KURITA K. , SHINOMURA T., UJITA M, , ZAKO M, , KIDA D., IWATA H. ( 

RA KIMATAK.; 

RT "Occurence of PG-Lb, a leucine-rich small chondroitin/dermatan 

RT sulphate proteoglycan in mammalian epiphyseal cartilage: molecular 

RT cloning and sequence analysis of the mouse cDNA,"; 

RL BIOCHEM. J. 318:909-914(1996). 

DR EMBL; D78274; D1011999; -. 

DR MGD; H6I:107942; DSPG3. 

DR PFAM; PF00560; LRR; 3. 

KW SIGNAL; PROTEOGLYCAN. 

FT SIGNAL 1 25 POTENTIAL. 

FT CHAIN 26 322 PROTEOGLYCAN. 

SQ SEQUENCE 322 AA; 36762 MW; AEA1D1E8 CRC32; 

•Query Match 16.81; Score 184; DB 11; Length 322; 

pest Local Similarity 34.1%; Pred. No. 2.37e-16; 
Matches 28; Conservative 27; Mismatches 25; Indels 2; Gaps 2; 

Db 121 CTCISTTVYCDDHELDAIP-PLPKKTTYFYSRFNRIKKIN-KNDFASLNDLKRIDLTSNL 178 

I I :||| I: : |::|| ::|: :| : I I :: :::: |::| :||::| 
Qy 74 CDC YGTTVDCNKRGLNT IPTS I PRFATQLLLSGNNIST VDLNSNI HVLENLEXLDLSNNH 133 

Db 179 ISEIDEDAFRKLPHLQELVLRD 200 

I: I:: :l 11= hll I I 
Qy 134 ITFINDKSFEKLSKLRELXLND 155 



RESULT 14 

ID Q90944 PRELIMINARY; PRT; 316 AA. 

AC Q90944; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-JAN-1999 (TREMBLREL. 09, LAST ANNOTATION UPDATE) 

DE PROTEOGLYCAN CORE PROTEIN PRECURSOR. 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N,A, 

RC STRAIN-WHITE LEGHORN; TISSUE-TIBIAL CARTILAGE; 

RX MEDLINE; 92112771. 

RA SHINOMURA T., KIMATA K.; 

IT "Proteoglycan -Lb, a small dermatan sulfate proteoglycan expressed in 
embryonic chick epiphyseal cartilage, is structurally related to 
osteoinductive factor."; 

W- J. BIOL. CHEM. 267:1265-1270(1992). 

DR EMBL; D10485; D1001838; -. 

DR PFAM; PF00560; LRR; 3. 

KW SIGNAL. 

FT SIGNAL 1 23 POTENTIAL. 

FT CHAIN 24 316 PROTEOGLYCAN CORE PROTEIN. 

■ SQ SEQUENCE 316 AA; 35856 MW; CEAC004B CRC32; 

Query Match 16.74; Score 182; DB 13; Length 316; 

Best Local Similarity 36.lt; Pred. No. 5.67e-16; 

Matches 30; Conservative 23; Mismatches 26; Indels 4;' Gaps 4; 

Db ' 115 CTCLGTTVYCDDRELDAVP-PLPK-NTMYFYSRYNRIRKIN-KNDFANLNNLKRIDLTAN 171 

I I Mil I: I h::l ::|: I : I I I :: :::: |:|| :||: I 
Qy 74 CDCYGTTVDCNKRGLNTIPTSIPRFATQLLLSG-NNISTVDLNSNIHVLENLEXLDLSNN 132 

Db 172 LISEIHEDAFRRLPQLLELVLRD 194 

I: I:: :| :|: I II I I 
Qy 133 HITFINDKSFEKLSKLRELXLND 155 



DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE TLR=TOLL-LIKE RECEPTOR. 

GN TLR. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N,A. 

RX MEDLINE; 95151581. 

RA CHIANG C, BEACHY P. A.; 

RT "Expression of a novel Toll-like gene spans the parasegment boundary 

RT and contributes to hedgehog function in the adult eye of 

RT Drosophila."; 

RL MECH. DEV. 47:225-239(1994). 

DR EMBL; S76155; G913248; -. 

DR FLYBASE; FBgn0004364; 18w. 

DR PFAM; PF00560; LRR; 13. 

SQ SEQUENCE 1385 AA; 154848 MW; 60273533 CRC32; 

Query Match 16.7%; Score 182; DB 5; Length 1385; 

Best Local Similarity 29.1%; Pred. No. 5.67e-16; 

Matches 50; Conservative 44; Mismatches 59; Indels 19; Gaps 14 

Db 679 NPFECDCSMEWLQRINNLTTRQHPHWDLGNIECLMPHSRSAPLRPLASLSASDFVCKYE 738 

II lll:::|l :|| I I :: : : I II : :|:|::: I II 
Qy 2 NPX ICDCNLQWLAQIN- L — QK-NI- ETSGARCEQPK - R- LRKKKFATLPPNKFKCK - G 52 

Db 739 SHCPPTCHCCEYEQCECEVICPGNCSCFHDATWATNIVDCGRQDLAALPNRIPQDVSDLY 798 

I : : I : III I I: :l I II! :: I ::| II: ::| 

Qy 53 SE-SFVSMY-A-DSCFIDSICPTQCDCY--GT-T-VDCNKRGLNTIPTSIPRFATQLL 103 

Db 799 LDGNNMPELEV-GHLTGRRNLRALYLNASNLMTLQNGSLAQLVNLRVLHLEN 849 

I III:: ::: ::: II I |: ::: : : |: I :|| I |:: 
Qy 104 LSGNNISTVDLNSNIHVLENLEXLDLSNNHITFINDKSFEKLSKLRELXLND 155 



Search completed: Fri May 28 09:06:08 1999 
Job time : 40 sees, 



RESULT 15 

ID Q26388 PRELIMINARY; PRT; 1385 AA. 
AC Q26388; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 
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Release 3.1A John F. Collins, Biocoraputing Research Unit, 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

I^rchjp protein - protein database search, using Smith-Waterman algorithm 
un on: Fri May 2 



Tabular output not generated 



09:08:00 1999; MasPar time 27.18 Seconds 

575.102 Million cell updates/sec 



Title: 

Description: 
Perfect Score: 



Scoring table: 



MJS-09-191-647-9 

(1-735) from US09191647 . pep 

5438 

1 SNKNLT SFPSRI PFDTTELY TVHIIRQCQCEPTKSVLSEK 735 

PAM 150 
Gap 11 

170751 seqs, 21266608 residues 



Post -processing: Minimum Match 0% 

Listing first 45 summaries 



1: parti 2:part2 3:part3 4:part4 5: parts 6:part6 7:part7 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13:partl3 
14:partl4 15:partl5 16:partl6 17:partl7 18:partl8 
19:partl9 20:part20 21:part21 22:part22 23:part23 
24:part24 25:part25 26:part26 27:part27 28:part28 
29:part29 30:part30 31:part31 32:part32 33:part33 
34:part34 35 :part35 36:part36 37:part37 38:part38 
39:part39 

istics: Mean 36.837; Variance 203.463; scale 0.181 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Result 



Query 



No. 


Score 


Match Length 


DB 


ID 


Description 


Pred. 


NO. 


1 


1813 


33.3 


1480 


5 


R25079 


Drosophila SLIT prote 


2.28e 


126 


2 


1673 


30,8 


1534 


30 


W46966 


Amino acid sequence o 


l,47e 


115 


3 


533 


9,8 


727 


21 


W11719 


C-Delta-1 polypeptide 


1.02e 


28 


4 


533 


9.8 


740 


21 


W00876 


C-Delta-1 polypeptide 


1.02e 


28 


5 


525 


9.7 


520 


25 


W18348 


Proliferation and dif 


3.98e 


28 


6 


527 


9.7 


685 


37 


W80813 


Nucleotide sequence o 


2.83e 


28 


7 


525 


9,7 


702 


25 


W18349 


Proliferation and dif 


3,98e 


28 


8 


527 


9.7 


722 


21 


W11720 


M-Delta-1 polypeptide 


2.83e 


28 


9 


525 


9.7 


723 


25 


W18353 


Proliferation and dif 


3,98e 


28 


10 


521 


9.6 


833 


6 


R28960 


Delta Dll. 


7,87e 


28 


11 


489 


9.0 


1036 


25 


W18351 


Proliferation and dif 


l.Ble 


25 


12 


489 


9.0 


1187 


25 


W18352 


Proliferation and dif 


l,81e 


25 


13 


489 


9,0 


1218 


29 


W44301 


Human serrate 1, 


l.Ble 


25 


14 


489 


9.0 


1218 


19 


W05833 


Human Serrate-1 (HJ1) 


l.Ble 


25 


15 


484 


8.9 


1208 


28 


W40827 


Human Jagged protein. 


4.22e 


25 


16 


482 


8.9 


1872 


36 


W68510 


Partial human Notch- 3 


5.93e 


25 



17 


486 8 


.9 2321 36 


W49698 


Human Notch3 protein. 


3,01e 


25 


18 


471 8 


.7 1193 19 


W05835 


Chick Serrate. 


3.82e 


24 


19 


475 8 


.7 1218 25 


W18354 


Proliferation and dif 


1.94e 


24 


20 


455 8 


.4 1055 29 


W44298 


Human serrate 2 prote 


5.70e 


23 


21 


455 8 


.4 1212 29 


W44299 


Human serrate 2. 


5,70e 


23 


22 


455 8 


.4 1257 19 


W05834 


Human Serrate -2 (HJ2) 


5.70e 


23 


23 


445 8 


.2 612 28 


W39256 


Human partial mature 


3.08e 


22 


24 


445 8 


.2 737 28 


W39257 


Human membrane protei 


3.08e 


22 


25 


420 7 


.7 660 21 


W11725 


H-Delta-1 polypeptide 


2.06e 


20 


26 


416 7 


,6 383 10 


R56166 


Neuroendocrine tumor 


4.02e 


20 


27 


401 7 


,4 385 10 


R56167 


Neuroendocrine tumor 


4.95e 


19 


28 


398 7 


,3 1404 7 


R38304 


Sequence of a serrate 


8.18e 


19 


29 


389 7 


.2 157 21 


W11730 


H-Delta-1 polypeptide 


3.67e 


18 


30 


374 6 


.9 473 17 


R86869 


Adhesive protein, 


4.46e 


17 


31 


353 6 


.5 196 5 


R29102 


Drosophila SLIT prote 


1.45e 


15 


32 


338 6 


.2 2189 1 


R05222 


Antigen GX5401FL enco 


1.72e 


14 


33 


331 6 


.1 228 30 


W46967 


Amino acid sequence o 


5.42e 


14 


34 


311 5 


.7 2707 24 


W27161 


Mouse receptor ME2. 


1.43e 


12 


35 


307 5 


.6 2409 3 


R12609 


Versican. 


2.75e 


12 


36 


300 5 


.5 77 6 


R28962 


ELR-11 and -12. 


8,59e 


12 


37 


286 5 


.3 1091 27 


W41641 


Sequence used in dete 


8.31e 


11 


38 


275 5 


.1 751 10 


R53088 


Human masking protein 


4,90e 


10 


39 


275 5 


.1 752 10 


R530B7 


Human masking protein 


4.90e 


10 


40 


275 5 


.1 756 10 


R53086 


Human masking protein 


4,90e 


10 


41 


275 5 


.1 845 10 


R53089 


Human masking protein 


4,90e 


10 


42 


273 5 


.0 1257 9 


R46627 


Neurocan core protein 


6.76e 


10 


43 


264 4 


,9 1355 3 


R14584 


TGF beta 1 binding pr 


2,86e 


09 


44 


267 4 


.9 1712 4 


R22461 


Masking protein high 


1.77e 


09 


45 


261 4 


,8 240 33 


W64219 


Human secreted protei 


4.62e 


09 



RESULT 1 

ID R25079 standard; Protein; 1480 AA. 
AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development, 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss. 

OS Drosophila melanogaster. 

FH Key Location/Qualifiers 

FT peptide 1..36 

/label- signal 
domain 73,. 294 

/label- Flank_LRR.Flank_l 
/note- "mediates adhesive events" 
domain 295.. 518 

/label- Flank-LRR-Flank„2 
/note- "mediates adhesive events" 
domain 519.. 714 

/label- Flank_LRR_Flank_3 
/note- "mediates adhesive events" 
domain 715.. 910 

/label- Flank_LRR_Flank_4 
/note- "mediates adhesive events" 
region 911.. 1150 

/label- Tandem_EGF_like_repeats 
/note- "involved in protein-protein interactions" 
region 1353.. 1393 

/label- 7th_EGF_like_repeat 
/note- "involved in receptor- ligand interactions" 
region 1394.. 1404 

/label- alternative.spl ice_segment 
/note- "developmental^ regulated" 
region 1405.. 1480 

/label- C-terminal_region 



WO9210518-A. 
25-JUN-1992. 
27-NOV-1991; 009055. 
07-DEC-1990; US-624135. 
(UYYA ) UNIV YALE, 
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PI Artavanis-Tsakonas S, Rothberg JM; 

DR WPI; 92-234590/28. 

DR N-PSDB; Q25811. 

PT SLIT protein and sequence elements for treating 

PT neurodegenerative disease - useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 1; Page 84-89; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic, the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways. The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart, The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding, SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

« injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 
sclerosis, diabetes -caused nerve damage, Parkinson's Disease", 
strokes, epilepsy, multiple sclerosis, paraplegia retinal 
degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 

CC claimed as are molecules comprising at least 1 Flank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 

Query Match 33.3%; Score 1813; DB 5; Length 1480; 

Best Local Similarity 41. U; Pred. No, 2,28e-126; 



1:1 :| II :|:::||::| |::| : :| llhlllhh: I I ||:| 



11:11 lllllllhlll |::IM ll::|llll II ll:::| :| hlllhlll 



:lllll : III III ::|:IHII I: : = hi" : I I "I 



I:: II I I I :: I I I I II: I III llllll:|l lllll I : II 



Matches 


Db 


729 


Qy 


1 


Db 


789 


Qy 


61 


Db 


849 


Qy 


121 


Db 


909 


1 


181 




969 


Qy 


241 


Db 


1015 


Qy 


301 


Db 


1075 


Qy 


361 


Db 


1135 


Qy 


420 


Db 


1195 


Qy 


478 


Db 


1254 







III I: I II Mill: : I II : 



I I: I:::|: =111 I :|h 



III I :M I II lllll I II 



111:11111 1:1 I 1:1 III |: I I I :|| I :| I 



1:11 |::| |: |::| : ::: Ml : ::: :|: : :: llhl |: 



Ill :| II : : : : :| I |::|: 



I ::||||:: :|:| : 



: |:|||: : I 



Qy 537 QIVENSGKSDQLITKGKEMLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINEVPIN 596 

Db 1312 fgnaqrqqkitpgca 1326 

: I : :h 
Qy 597 LQQALENVNTEQSCS 611 



RESULT 2 

ID W46966 standard; Protein; 1534 AA. 

AC W46966; 

DT 06-JUL-1998 (first entry) 

DE Amino acid sequence of a human slit-like polypeptide. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1. .26 

FT /note= "signal peptide" 

FT Protein 27.. 1534 

FT /note- "mature protein" 

PN J10087699-A. 

PD 07-APR-1998. 

PF 15-JUL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-267127/24. 

DR N-PSDB; V16978. 

PT Human Slit-like protein - useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 31-35; 45pp; Japanese, 

CC The present sequence represents a novel human slit-like protein (the 

CC mature protein is claimed in Claim 1). The slit-like polypeptide is 

CC useful for diagnosis and treatment of brain-specific diseases and 

CC cancers. Antibodies directed against the protein, or its fragments 

CC can also be used for diagnosing cancer. 

SQ Sequence 1534 AA; 

Query Match 30.8%; Score 1673; DB 30; Length 1534; 

Best Local Similarity 38.8%; Pred. No. 1.47e-115; 

Matches 255; Conservative 164; Mismatches 203; Indels 36; Gaps 24; 

Db 748 snkhlralpkgipknvtelyldgnqftlvpgq-lstfkylqlvdlsnnkisslsnssft 805 

111:1 ::| II = lllllhl : =h: h : I I :|lhh: II |::h 
Qy 1 SNKNLTSFPSRIPFDTTELYLDANYINEIPAHDLNRL-YSLTKLDLSHNRLISLENNTFS 59 

Db 806 nmsqlttlilsynalqcipplafqglrslrllslhgndistlqegifadvtslshlaiga 865 

h::hllhlll hh Mil II :lhlllllllll I :: I : 1 1 : : I : I : I : 
Qy 60 NLTRLSTLIISYNKLRCLQPLAFNGLNALRILSLHGNDISFLPQSAFSNLTSITHIAVGS 119 

Db 866 nplycdchlrwlsswvktgykepgiarcagpqdmegklllttpakkfecqgpptlavqak 925 
hllllh: hi hh : MINI I : ||||: : I |:: : :| 

Qy 120 NSLYCDCNMAWFSKWIKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTKLATK 179 

Db 926 cdlclsspcqnqgtchndplevyrcacpsgykgrdcevslnscssgpcenggtchaqege 985 

llllhlll I : I : I I I :h I II :::| ::H I :|l :: 
Qy 180 CDLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKV--AQ 237 

Db' 986 dapftcscptgfegptcgvntddcvdhacangg v--c---v-d--g--vgnytcq 1030 

: I I I llll I I lllh I III I ,: 

Qy 238 AGRFNCYCNKGFEGDYCEOIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCD 297 

Db 1031 cplqyegkaceqlvdlcspdlnpcqheaqcvgtpdgprcecmpgyagdncsenqddcrdh 1090 

lh:|lll lh :: h lllh::: |: : | | ||::|:|| | |||:: 
Qy 298 CPMEYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNV 357 

Db 1091 rcqngaqcmdevnsysclcaegysgqlceipphlp-a-pks-pcegtecqngancvd-qg 1146 

lllh hi : II III Ihll lllll : h :h : M :|l I 
Qy 358 ECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQG-ECVASQN 416 

Db ' 1147 nrp-vcqclpgfggpecekllsvnfvdrdtylqftdlqnwpranitlqvstaedngilly 1205 

: I I Ihll h: ill I : :|| : I : : ||: : |: lllll 
Qy 417 SSDFTCKCHEGFSGPSCDRQMSVGFKNPGAYLALDPLAS-D-GTITMTLRTTSKIGILLY 474 
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Db 1206 ngdndhiavelyqghvrvsydpgsypssaiysaetindgqfhtvelvafdqmvnlsidgg 1265 

II: :: ll|:|:|:: I |::|:| :||: :||| I : : ; ;:: | || 
Qy 475 YGDDHFVSAELYDGRVKLVYYIGNFPASHMYSSVKVNDGLPHRISIRTSERKCFLQIDKN 534 

Db 1266 spmtmdnfgkhytlnsea-p-lyvggmpvdvnsaafrlwqilngtgfhgcirnlyinn-- 1321 

: "Ml I : : 11 = 11 = 1" : II :: h :: III :: ||: 
Qy 535 PVQIVENSGKSDQLITKGKEMLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINEVP 594 

Db 1322 -elqd-ftktqmkpgvvpgcepcrklyclhgicqpna-tp-gpmchceagwvglhcdq 1375 

Ml: : : : : : I : I : I I 1 1 : 1 I ||:|:: I III: 
Qy 595 INLQQALENVNTEQSCSATVNFCAGIDCGNGKCTNNALSPKGYMCQCDSHFSGEHCDE 652 



RESULT 3 

ID W11719 standard; Protein; 727 aa. 
AC W11719; 

28-APR-1997 (first entry) 
c -Delta -1 polypeptide, 

C-Delta-1; cell proliferation; nervous system disorder; 
tissue regeneration; Notch; cervix cancer; breast cancer; 
KW lung cancer; colon cancer; melanoma; seminoma; 



f 



KW 


neurogenesis; therapy, 


OS 


Gallus sp. 




FH 


Key 


Location/Qualifiers 


FT 


domain 


184.. 228 


FT 




/label- DSL 


FT 


domain 


229,. 261 


FT 




/label- EGF1 


FT 


domain 


262, ,292 


FT 




/label- EGF2 


FT 


domain 


293,, 332 


FT 




/label- EGF3 


FT 


domain 


333, ,370 


FT 




/label- EGF4 


FT 


domain 


371., 409 


FT 




/label- EGF5 


FT 


domain 


410. ,447 


FT 




/label- EGF6 


FT 


domain 


448. .485 


FT 




/label- EGF7 


FT 


domain 


486. .523, 


FT 




/label- EGF8 


FT 


domain 


524.. 534 


FT 




/label- EGF9 




domain 


555. ,579 


i 




/label- TM 






/note- "transmembrane domain 



TS WO9701571-A1, 

PD 16-JAN-1997. 

PF 28-JUN-1996; 011178. 

PR 28-JUN-1995; US-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY, 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58897 . 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 2; 135pp; English, 

CC C-delta-1 polypeptide (W11719) is the chick homologue of Drosophila 

CC Delta, a protein that binds to Notch protein, Expression of 

CC C-Delta-1 correlates with onset of neurogenesis, The C-delta-1 

CC amino acid sequence was deduced from a cDNA clone (T58897) obtd. 

CC from chick stage 4-6 embryos, An alternatively spliced variant 

CC (W00876) was also isolated, and mouse (W11720) and human (W11721- 

CC 38) Delta-1 polypeptides have been identified. Delta-1 proteins 

CC can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, and nervous system disorders or to 

CC promote tissue regeneration and repair. 

SQ Sequence 727 AA; 



Query Match 9.8%; Score 533; DB 21; Length 727; 

Best Local Similarity 38.7*; Pred. No. 1.02e-28; 

Matches 79; Conservative 40; Mismatches 68; Indels 17; Gaps 12; 

Db 302 hkpckngatctntgqgsytcscrpgytgssceieinecdanpcknggsctdlens-ysct 360 

: llll I I I: IN: ||: I || :|: | ::|| | ::| : : ::| 
Qy 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 361 cppgfygknce--lsa-rn-t-cadgp-cfnggr-ctd--*n--p-d-ggyscrcplgysg 406 

I II I II : : : I :| I : I |:: I : : :| I II: I I 
Qy 245 CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 304 

Db 407 fncekkidycss-'spcangaqcvdlgnsyicqcqagftgrhcddnvddcasfpcvnggt 464 

ill :|| I : |: : Nil = 1111 :|: hill : I |||: 

Qy 305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGS 364 

Db 465 cqdgvndysctcppgyngkncstp 488 

III: I I I III I I I 
Qy 365 CVDGILSYDCLCRPGYAGQYCEIP 388 



RESULT 4 

ID WQQ876 standard; Protein; 740 AA. 

AC W00876; 

DT 28-APR-1997 (first entry) 

DE C-Delta-1 polypeptide (alternatively spliced variant) . 

KW C-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy. 

OS Gallus sp. 

FH Key Location/Qualifiers 

FT domain 184., 228 

ft /label- DSL 

FT domain 229.. 261 

FT /label- EGF1 

FT domain 262.. 292 

FT /label- EGF2 

FT domain 293., 332 

FT /label- EGF3 

FT domain 33 3.. 370 

FT /label- EGF4 

FT domain 371.. 409 

ft /label- EGF5 

FT domain 410., 447 

FT /label- EGF6 

FT domain 448.. 485 

FT /label- EGF7 

FT domain 486 ,.523 

FT /label- EGF8 

FT domain 524.. 534 

FT /label- EGF9 

FT domain 555,, 579 

FT /label- TM 

FT /note- "transmembrane domain" 

PN WO9701571-A1. 

PD 16-JAN-1997. 

PF 28-JUN-1996; 011178. 

PR 28-JON-1995; US-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas 5, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. , 

DR N-PSDB; T58898. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 2; 135pp; English. 

CC C-delta-1 polypeptide (W00876) is the chick homologue of Drosophila 

CC Delta, a protein that binds to Notch protein, Expression of 

CC C-Delta-1 correlates with onset of neurogenesis. The C-delta-1 
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CC amino acid sequence was deduced from a cDNA clone (T58898) obtd. 

CC from chick stage 4-6 embryos. A shorter version (W58877) of 

CC C-Delta-1, lacking the 12 C-terminal amino acids of the longer 

CC version, was also isolated, and mouse (W11720) and human (W11721- 

CC 38) Delta -1 polypeptides have been identified. Delta -1 proteins 

CC can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, and nervous system disorders or to 

CC promote tissue regeneration and repair. ■ 

SQ Sequence 740 AA; 

Query Match 9.8%; Score 533; DB 21; Length 740; 

Best Local Similarity 38.7*; Pred. No. 1. 02e-28; 

Matches 79; Conservative 40; Mismatches 68; Indels 17; Gaps 12; 

Db 302 hkpckngatctntgqgsytcscrpgytgssceieinecdanpcknggsctdlens-ysct 360 

: llll I I I: 111:1 II: I II :|: I ::|| I ::| : : ::| 
Qy 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 361 cppgfygknce--lsa-m-t-cadgp-cfnggr-ctd---n--p-d-ggyscrcplgysg 406 

f I M I II : : : | :| | : | I:: I : : :| I ||: | 
245 CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCXPMEYEG 304 
407 fncekkidycss--spcangaqcvdlgnsyicqcqagftgrhcddnvddcasfpcvnggt 464 
:H l"ll: '\\ I : |: : llll :|||| :|: hill : I II: 
Qy 305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGS 364 

Db 465 cqdgvndysctcppgyngkn'cstp 488 

I II: I I I III I I I 
Qy 365 CVDGILSYDCLCRPGYAGQYCEIP 388 



RESULT 5 

ID W18348 standard; protein; 520 AA. 

AC W18348; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27, 

•Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 
proliferation and differentiation of undifferentiated human blood 
cells 

PS Claim 3; Page 59-61; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 520 AA; 

Query Match 9.7%; Score 525; DB 25; Length 520; 

Best Local Similarity 38.7*; Pred. No. 3.98e-28; 

Matches 79; Conservative 41; Mismatches 67; Indels 17; Gaps 12; 

Db 274 hkpckngatctntgqgsytcscrpgytgatcelgidecdpspcknggsctdlens-ysct 332 

: llll I I I: llhl II: I II II I III I ::| : : ::| 
Qy 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 333 cppgfygkice--lsa-m-t-cadgp-cfnggr-csd---s--p-d-ggyscrcpvgysg 378 

I II I II : : : I :| I : | ||: : : : :| | ||: | | 
Qy 245 CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 304 



Db 379 fncekkidycss--spcsngakcvdlgdaylcrcqagfsgrhcddnvddcasspcanggt 436 

:| |::||: :|| I :||: : :| | | :||:| :|: |:||| : I III: 
Qy 305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKHVECQNGGS 364 

Db 437 crdgvndfsctcppgytgrncsap 460 

I II: : I I IM:I: I I 
Qy 365 CVDGILSYDCLCRPGYAGQYCEIP 388 



RESULT 6 

ID W80813 standard; Protein; 685 AA. 

AC W80813; 

DT 16-FEB-1999 (first entry) 

DE Nucleotide sequence of the human Delta 3 protein , 

KW Human; Delta 3 protein; agonist; tissue regeneration; 

KW neurodegenerative disease; neurodifferentiative disorder; 

KW neurodevelopmental disorder; peripheral neuropathy; 

KW spinocerebella degeneration; antagonist; neoplastic disease; 

KW hyperplastic disease; cancer; Waldenstroem's macroglobulemia; 

KW fibroproliferative disorder; cerebravascular tissue; gene therapy; 

Hi- antibody. 

OS Homo sapiens. 

PN W09845434-A1. 

PD 15-OCT-1998. 

PF 06-APR-1998; U06775. 

PR ll-JUN-1997; US-872855. 

PR 04-APR-1997; US-832633. 

PA (MILL-) MILLENNIUM BIOTHERAPEUTICS INC. 

PI Gearing DP, McCarthy SA; 

DR WPI; 98-594482/50. 

DR N-PSDB; V68523. 

PT New isolated human Delta3 gene - used to develop products for 

PT treating, e.g. nerve injury, neurodegenerative disorders, peripheral 

PT neuropathies and spinocerebella degenerations 

PS Claim 2; Fig 1; 160pp; English. 

CC This is the amino acid sequence of the human Delta 3 protein 

CC used in the method of the invention . The Delta3 gene is involved in 

CC the growth and differentiation of cells . Delta3 agonists can be used 

CC for promoting the tissue regeneration or repair needed to treat a 

CC nerve injury, neurodegenerative disease, neurodifferentiative or 

CC neurodevelopmental disorders including peripheral neuropathies and 

CC spinocerebella degenerations. Delta3 antagonists can be used to treat 

CC neoplastic or hyperplastic diseases, e.g. cancers, Waldenstroem's 

CC macroglobulemia and fibroproliferative disorders, particularly of 

CC cerebravascular tissue, The nucleic acids can also be used for gene 

CC therapy, The products can also be used for antibody production, 

CC detection, diagnosis and drug screening. 

SQ Sequence 685 AA; 

Query Match 9,74; Score 527; DB 37; Length 685; 

Best Local Similarity 37.8%; Pred, No. 2.83e-28; 

Matches 95; Conservative 45; Mismatches 87; Indels 24; Gaps 11; 

Db 291 hspckngatcsnsgqrsytctcrpgytgvdcelelsecdsnpcrnggsckdqedg-yhcl 349 

:|MII II :: Mil I II: II II :: I ::|| I ::|l : I ::| 
Qy 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 350 cppgyyglhcehstlscadspcfnggsc — r — e-rn-qg-an-yacecppnftg 396 

I I: I II : I :| I III I I I :| |: I I |:|| :: I 

Qy 245 CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 304 

Db 397 sncekkvdrcts--npcanggqclnrgpsrmcrcrpgftgtycelhvsdcarnpcahggt 454 

:N II III I I |: III Mill II :: II I :lh 
Qy 305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGS 364 

Db 455 chdlenglmctcpagf sgrrcevrt- -si dacasspcfnratcytdlstdtfvcn 507 

II : I I :|::|: II: : III |:| :: I : :: | |: 

Qy 365 CVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSAC-GQGECVASQNSSDFTCK 423 

Db 508 cpygfvgsrce 518 

I II I: I: 
Qy 424 CHEGFSGPSCD 434 



4 
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RESULT 7 

ID W18349 standard; protein; 702 AA. 

AC W18349; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1, 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356 . 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

^1 Itoh A, Sakano S; 

M WPI; 97-298110/27. 

Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 4; Page 61-64; 114pp ; Japanese, 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 702 AA; 

Query Match 9.7%; Score 525; DB 25; Length 702; 

Best Local Similarity 38,7*; Pred. No, 3,98e-28; 



: INI I I I: llhl II: I II II I III I ::| 



Matches 


Db 


274 


Qy 


185 


Db 


333 


Qy 


245 


Db 


379 




305 


i 


437 


Qy 


365 



I II I II 



I :| I : III: 



: 111:11 



:H l::M: :|| I :||: : :| I I :||:| :|: |:||| : | |||: 



III: : I I llhl: I I 



RESULT 8 

ID W11720 standard; Protein; 722 AA. 

AC W11720; 

DT 28-APR-1997 (first entry) 

DE M-Delta-1 polypeptide. 

KW M-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy. 

OS Mus sp. 

PN WO9701571-A1. 

PD 16-JAN-1997. 

PF 28-JUN-1996; U11178. 

PR 28-JUN-1995; US-000589, 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY, 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58899. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 



PT regeneration 

PS Claim 4; Fig 8; 135pp; English. 

CC M-delta-1 polypeptide (W11720) is the mouse homologue of Drosophila 

CC Delta, a protein that binds to Notch protein, it is expressed 

CC primarily in presomitic mesoderm, the central and peripheral 

CC nervous systems, and kidney. Chick (W11719) and human (W11721- 

CC 38) Delta-1 polypeptides have also been identified. Delta-1 

CC proteins can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, as well as nervous system disorders, 

CC and to promote tissue regeneration and repair, 

SQ Sequence 722 AA; 

Query Match 9.7%; Score 527; DB 21; Length 722; 

Best Local Similarity 35.6%; Pred, No. 2.83e-28; 



11:1 I I I: llhl II: I :|| ::! I III I |:| 



Matches 


Db 


294 


Qy 


185 


Db 


353 


Qy 


245 


Db 


399 


Qy 


305 


Db 


457 


Qy 


365 


Db 


507 


Qy 


425 



I II I II 



I :| I : I II: I 



: I II: : I 



:M I- I HI I :lh : III I :lhl II Mil 



II:: = I I llhl I :|: 



:|::||:|: 



RESULT 9 

ID W18353 standard; protein; 723 AA, 

AC W18353; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1. .21 

FT /label- Signal 

FT Protein 22.. 723 

FT /label- Differentiation_suppression_protein 

PN W09719172-A1. 

PD, 29-MAY-1997, 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811, 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

DR N-PSDB; T70174. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 15; Page 77-82; 114pp; Japanese, 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells , The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 723 AA; 
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Query Match 9.7%; Score 525; DB 25; Length 723; 

Best Local Similarity 38.7%; Pred. No. 3.98e-28; 



INI I I I: Ih li: I II II I III I ::| 



Matches 


Db 




Qy 


185 


Db 


354 


Oy 


245 


Db 


400 


oy 


305 


Db 


458 


Qy 


365 



I II I II 



: : I :l I : III: 



:| I Ih I I 



:|| |::||: HI I :ll: : :| I I :||:| ;|; |:||| : I |||: 



I II: 



rT 10 
R28960 standard; Protein; 833 AA. 
R28960; 

DT 01-APR-1993 (first entry) 

DE Delta Dll . 

KW Human; Notch; plasmid; cDNA; clone; Dll; expression library; PCR; 

KW polymerase chain reaction; primer; cloning vector; Delta; Serrate; 

KW neurogenic; toporythmic; homotypic; heterotypic; differentiation; 

KW quantitation; antibody. 

OS Homo sapiens, 

PN W09219734-A, 

PD 12-NOV-1992. 

PF 01-MAY-1992; 003651. 

PR 03-MAY-1991; US-695189 . 

PR 14-NOV-1991; US-791923. 

PA (INDV ) UNIV INDIANA FOUND. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Blaumueller CM, Fehon RG, Muskavitch MAT; 

PI Rebay I, Shepard SB; 

DR WPI; 92-398861/48. 

DR N-PSDB; Q30997. 

PT Human Notch and Delta dna and protein sequences ■ used for study 

PT and manipulation of differentiation processes 

PS Claim 50; Fig 13; 239pp; English. 

CC The sequence given is encoded by the nucleotide sequence of human 

CC Delta gene contained in plasmid cDNA clone Dll. A human expression 

CC library was constructed and screening assays were carried out on to 

CC select for the expressed Delta product. Alternatively the sequences 

« could be isolated by amplification using polymerase chain reaction 
(PCR) primers. The isolated gene may be inserted into a cloning 
vector and expressed. The Delta gene and also the Notch and Serrate 

CC neurogenic genes are designated "toporythmic" genes. The proteins 

CC they encode are involved in specific homo- or heterotypic interactions 

CC crucial to differentiation. The quantitation of mRNA for human Notch 

CC and Delta and adhesive molecules, and study of its expression are 

CC possible using the DNA and antibodies raised against the Notch and 

CC Delta proteins. 

SQ Sequence 833 AA; 

Query Match 9.6%; Score 521; DB 6; Length 833; 

Best Local Similarity 37.1%; Pred. No. 7 . 87e-28; 

95; Conservative 55; Mismatches 77; Indels 29; Gaps 22; 



Matches 


Db 


295 


Qy 


183 


Db 


355 


Qy 


239 


Db 


406 



I I llll : 



111:1:11: I MM :| : :|| I :|| 



I : I I I : II 



j-q--gyqce 405 

: I :hl: 



406 cpigysgpncdlqldncsp- -npciaggscqp-sgk - -cicpagf sgtrcetniddclgh 460 



II: I I :h h h III I I I I :l l:|::lhl llllllll 



Qy 


298 


Db 


461 


Qy 


358 


Db 


512 


Qy 


418 



RESULT 11 

ID W18351 standard; protein; 1036 AA, 

AC W18351; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 5; Page 66-71; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression, 

SQ Sequence 1036 AA; 

Query Match 9.0%; Score 489; DB 25; Length 1036; 

Best Local Similarity 36.3%; Pred. No. l,81e-25; ■ 

78; Conservative 40; Mismatches 83; Indels 14; Gaps 12; 



Matches 


Db 


423 


Qy 


183 


Db 


480 


Qy 


243 


Db 


540 


Qy 


298 


Db 


600 


Qy 


352 



II : I 1:1 I 



I I Ml: I III :|| I ::M! : I 



clcptgfsgnlcqldidycepnpcqngaqcynrasdyfckcpedyegkncshlkdhcrtt 539 

II III: I: :|| I : |:||: I : II :|: ::: I 



I I :IH : :h ::| h llll I II 



:|| : : 1 : 1 : 1 : 1 1 : II hi I I III 



RESULT 12 

ID W18352 standard; protein; 1187 AA, 

AC W18352; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997. 
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PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811, 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK, 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 6; Page 71-76; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells / such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

" " 1187 AA; 



luery Match 9.0%; Score 489; DB 25; Length 1187; 

Best Local Similarity 36.3%; Pred. No. 1.81e-25; 
Matches 78; Conservative 40; Mismatches 83; Indels 14; Gaps 12; 

Db 423 cl-gqcqndascrdlvn-gyrcicppgyagdhcerdidecasnpclngghcq-neinrfq 479 

II : I hi I : I I I II: I III HI I "Mil : I : II 
Qy 183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCiCVAQAGRFN 242 

Db 480 clcptgfsgnlcqldidycepnpcqngaqcynrasdyfckcpedyegkncshlkdhcrtt 539 

II III: I: :|| I : |:||: I : I :|: ::: | 

Qy 243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVR- - FC - - SEELKNFQSFQI NSY - RCD 297 

Db 540 pcevidsctvamasndtpegvryissnvcgphgkcksqsggkftcdcnkgftgtycheni 599 

I : = II :||| : :|: ::| h |||| | || 

Qy 298 -CPM-E-YE-GKHCEDRLEYCT-KKLNPCENNGKCIPINGS-YSCMCSPGFTGNNCETNI 351 

Db 600 ndcesnpcrnggtcidgvnsykcicsdgwegayce 634 

:M : l:lll:|:||: II hi I I III 
Qy 352 DDCKNVECQNGGSCVDGILSYDCLCRPGYAGQYCE 386 



RESOLT 13 

ID W44301 standard; Protein; 1218 AA, 

AC W44301; 

DT 19-JUN-1998 (first entry) 

DE Human serrate 1, 

Jffl Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

M leukaemia; endothelial cell; tumour. 
Homo sapiens. 

TH Key Location/Qualifiers 

FT Peptide 1. .31 

FT /label- Signal 

FT Protein 32.. 1218 

FT /label- Serrate-1 

PN WO9802458-A1. 

PD 22-JAN-1998. 

PF ll-JOL-1997; J02414. 

PR 14-MAY-1997; JP-124063. 

PR 16-JUL-1996; JP-186220. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

Pi Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15201. 

PT Human serrate-2 gene expression products - used to regulate stem 

PT cell differentiation, useful in treating neoplasms, e.g. leukaemia 

PS Disclosure; Page 77-86; 103pp; Japanese. 

CC The present sequence represents human serrate 1, from the present 

CC invention which describes human serrate 2. The present invention also 

CC describes a method for the preparation of the polypeptides, and 

CC antibodies binding to the polypeptide and its fragments, The polypeptide 

CC and its fragments expressed by the serrate-2-gene can be used to inhibit 

CC stem (especially blood stem) cell differentiation and to inhibit 

CC endothelial cell growth. They may be incorporated in a cell culture 

CC media for culturing undifferentiated stem cells. They can also be used 

CC for treatment of neoplasms such as leukaemia. The antibodies can be used 



CC for the diagnosis of malignant tumours, 
SQ Sequence 1218 AA; 

Query Match 9.0%; Score 489; DB 29; Length 1218; 

Best Local Similarity 36,31; Pred, No. l.Ble-25; 

Matches 78; Conservative 40; Mismatches 83; Indels 14; 



12; 



Db 454 cl-gqcqndascrdlvn-gyrcicppgyagdhcerdidecasnpclngghcq-neinrfq 510 

II : I hi I : I I I II: I III :|| I ::|||| : I : II 
Qy 183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFN 242 

Db 511 clcptgfsgnlcqldidycepnpcqngaqcynrasdyfckcpedyegkncshlkdhcrtt 570 

II III: I: :|| I : h|h I : || :|: ::: | 

Qy 243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVR--FC-SEELKNFQSFQINSY-RCD 297 

Db 571 pcevidsctvamasndtpegvryissnvcgphgkcksqsggkftcdcnkgftgtycheni 630 

I : : :: :l I II :|ll : :h ::l h llll I II 
Qy 298 -CPM-E-YE-GKHCEDKLEYCT-KKLNPCENNGKCIPINGS-YSCMCSPGFTGNNCETNI 351 

Db 631 ndcesnpcrnggtcidgvnsykcicsdgwegayce 665 

:|| : hllhhll: II hi I I III 
Qy 352 DDCKNVECQNGG SCVDG I LSY DCLCRPGYAGQYCE 386 



ID 


W05833 standard; Protein; 1218 AA. 


AC 


W05833; 




DT 


28-JAN-1997 


(first entry) 


DE 


Human Serrate-1 (HJ1). 


KW 


Serrate-1; human jagged-1; HJ1; Notch; cell differentiation; 


KW 


cell fate; central nervous system; cancer; tissue repair; therapy; 


KW 


diagnosis; antibody, 


OS 


Homo sapiens. 




FH 


Key 


Location/Qualifiers 


FT 


domain 


1. .1067 


FT 




/label- Extracellular Janata 


FT 


peptide 


14. .29 


FT 




/label- Sig_peptide 


FT 


domain 


185.. 229 


FT 




/label- DSL 


FT 




/note- "region of homology with Drosophila Delta 


FT 




■ and Serrate, predicted to mediate binding 


FT 




with Notch" 


FT 


domain 


234,. 896 


FT 




/label- ELR 


FT 




/note- "epidermal growth factor-like repeat domain 


FT 


region 


234.. 264 


FT 




/label- ELR1 


FT 


region 


265.. 299 


FT 




/label- ELR2 


FT 


region 


300.. 339 


FT 




/label- ELR3 


FT 


region 


340.. 377 


FT 




/label- ELR4 


FT 


region 


378. .415 


FT 




/label- ELR5 


FT 


region 


416.. 453 


FT 




/label- ELR6 


FT 


region 


454.. 490 


FT 




/label- ELR7 


FT 


region 


491.. 528 


FT 




/label- ELR8 


FT 


region 


529.. 566 


FT 




/label- ELR9 


FT 


region 


567,. 598 


FT 




/label- PartialjLR 


FT 


region 


599.. 632 


FT 




/label- Partial ELR 


FT 


region 


633,, 670 


FT 




/label- ELR10 


FT 


region 


671., 708 


FT 




/label- ELR11 


FT 


region 


709.. 747 
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FT /label- ELR12 

FT region . 748.. 785 

FT /label- ELR13 

FT region 786.. 823 

FT /label- ELR14 

FT region 824,, 862 

FT /label- ELR15, 

FT region 863,. 879 

FT /label- PartialJLR 

FT region 880., 896 

FT /label- PartialJLR 

FT domain 1068., 1089 

FT /label- Transmembrane domain 

FT domain 1090,. 1218 

FT /label- Intracellular domain 

PN WO9627610-A1. 

PD 12-SEP-1996. 

PF 07-MAR-1996; 003172. 

PR 07-MAR-1995; (JS-400159. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY, 

#(OYYA ) DNIV YALE. 
Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 
Lewis JH, Mann rs, Myat AM; 

DR HPI; 96-425379/42. 

DR N-PSDB; T40090. 

PT Vertebrate Serrate protein and related DNA * used to treat or 

PT prevent malignancies characterised by increased Notch activity. 

PS Claim 4; Page 95-98; 161pp; English. 

CC Human Serrate-1 (W05833) and human Serrate-2 (W05833) are ligands 

CC for the zygotic neurogenic locus Notch, and are believed to play a 

CC major role in determining cell fates (differentiation) in the 

CC central nervous system. Their amino acid sequences were deduced 

CC from cDNA clones (see also T40090-91) isolated from human foetal 

CC brain cDNA libraries. The proteins, antibodies raised to them, 

CC and encoding nucleic acids can be used in the detection of 

CC Serrate sequences and in the treatment of disorders of cell fate 

CC or differentiation, partic. cancer, nervous system disorders 

CC and in tissue repair or regeneration. 

SO Sequence 1218 AA; 

Query Match 9.04; Score 489; DB 19; Length 1218; 
Best Local Similarity 36.31; Pred, No. l,81e-25; 



I I Ml: I III :M I ::IHI : I 



Matches 


Db 


454 


Qy 


183 


Db 


511 


1 


243 


Db 


571 


Qy 


298 


Db 


631 


Qy 


352 



II III: I: :M I : hlh I 



II :|: 



I I :lll : :|: ::| I: Mil I II 



:M : l:lll:!:||: II |:| I I III 



RESULT 15 

ID W40827 standard; Protein; 1208 AA, 

AC W40827; 

DT 21-MAY-1998 (first entry) 

DE Human Jagged protein. 

KW Jagged; Notch; angiogenesis; endothelial cell; migration; human; 

KW wound repair; vulnerary; injury repair; signal transduction; 

KW motor neurone disease; amyotrophic lateral sclerosis; polymyelitis; 

KW diagnosis; therapy. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

ft Peptide 1. .11 



FT /label- Sigjeptide 

FT Domain 175.. 220 

FT /note- "DSL (Delta, Serrate, Lag-2 and Apx-1) 

FT domain" 

FT Region 224,. 852 

FT /note- "EGF-like repeat region containing 16 

FT EGF repeats" 

FT Miscjifference 526 

FT /note- "encoded by ANC" 

FT Region 853.. 992 

FT /note- "cysteine-rich region" 

FT Domain 1058.. 1083 

FT /note- "transmembrane domain" 

FT Region 1084.. 1208 

FT /note- "cytoplasmic region" 

PN W09745143-A1, 

PD 04-DEC-1997. 

PF 30-MAY-1997; U09407. 

PR 31-MAY-1996; DS-01B841. 

PA (NAAM-) NAT AMERICAN RED CROSS, 

PA (UYGE-) UNIV GENEVE. 

PI Maciag T, Montesano R, Pepper M, Wong MK, Zimrin AB; 

DR WPI; 98-032340/03. 

DR N-PSDB; V03674. 

PT New human Jagged protein ■ used to inhibit or promote angiogenesis 

PT and to control migration of endothelial cells in injured blood 

PT vessels 

PS Claim 2; Page 54-61; 81pp; English, 

CC This sequence comprises the human homologue of the rat Jagged 

CC protein . Jagged is able to bind Notch protein and is involved in 

CC' endothelial cell (EC) migration and differentiation. The human 

CC Jagged amino acid sequence was deduced from a human endothelial 

CC cell cDNA (see V03674) induced by exposure to fibrin. Jagged 

CC polypeptides can be expressed in host cell systems. A method for 

CC treating or preventing disease by administering an agent that 

CC (ant)agonises, inhibits, prevents, enhances or stimulates function 

CC of the Notch or Jagged proteins is claimed, as well as a method for 

CC affecting differentiation of mesoderm, endoderm, ectoderm and/or 

CC neuroderm cells. When Jagged is applied to a micro-diameter blood 

CC vessel from which ECs have been removed, damaged or reduced, it 

CC decrease migrations of EC to the site, but when delivered to a 

CC similar site on a large vessel it increases EC migration. Jagged 

CC and its agonists are used to inhibit or prevent angiogenesis (where 

CC associated with solid tumours, rheumatoid arthritis, inflammation, 

CC or restenosis, particularly preventing angiogenesis from the vaso 

CC vasorum and promoting large vessel EC migration to repair the lumen 

CC of large vessels). Anti-Jagged and Jagged antagonists (e.g. 

CC antisense Jagged and Jagged mutants) are used to promote or enhance 

CC angiogenesis, particularly for wound and injury repair, e.g. where 

CC surgical, traumatic and/or caused by disease, e.g. diabetes -related 

CC (all claimed) . Angiogenesis can be modulated in vitro or in vivo 

CC and expression of proteins by gene therapy is included. Modulation 

CC of the Notch-Jagged signalling pathway may also be involved in 

CC placental development and motor neurone diseases such as 

CC amyotrophic lateral sclerosis, poliomyelitis etc. 

SQ Sequence 1208 AA; 

Query Match 8.9»; Score 484; DB 28; Length 1208; 

Best Local Similarity 35.8%; Pred, No, 4.22e-25; 



Matches 


Db 


444 


Qy 


183 


Db 


501 


Qy 


243 


Db 


561 


Qy 


298 



II : I 1:1 I 



11111:1 III :ll I ::|||| : I 



II III: I: :H I : |: |: | : 



II :|: 



I I :l!l : :h ::l I: III! I II 
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621 ndcesnpcrnggtcidgvnsykcicsdgwegayce 655 

:|| : 1 : 1 i 1 : 1 : 1 1 : II hi I I Ml 
352 DDCKNVECQNGGSCVDGILSYDCLCRPGYAGQYCE 386 



Search completed: Fri May 28 09:09:55 1999 
Job time : 115 sees . 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^^■rch_pp protein - protein database search, using Smith-Waterman algorithm 

Run on; Fri May 28 09:10:14 1999; MasPar time 31,17 Seconds 

944.872 Million cell updates/sec 

Tabular output not generated, 

Title; >US-09-191-647-9 

Description: (1-735) from US09191647. pep 

Perfect Score: 5438 

Sequence: 1 SNKNLTSFPSRIPFDTTELY TVHIIRQCQCEPTKSVLSEK 735 

Scoring table: PAM 150 
Gap 11 

Searched: 122810 seqs, 40068593 residues 

Post-processing: Minimum Match 0% 

Listing first 45 summaries 



pir60 

l:pirl 2:pir2 3:pir3 4:pir4 



Mean 49.8 



Variance 100.304; scale 0.497 



Pred, No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



% 

Query 



SUMMARIES 



NO. 


Score 


Match Length 


DB ID 


Description 


Pred. No. 


1 


1894 


34,8 


1469 


2 B36665 


slit protein 2 precur 


0.00e+00 


2 


1838 


33.8 


1480 


2 A36665 


slit protein 1 precur 


0.00e+00 


3 


1188 


21.8 


530 


2 A31640 


epidermal growth fact 


9.23e-210 


4 


588 


10.8 


1064 


2 A40136 


fibropellin la - sea 


2.04e-87 


5 


559 


10.3 


570 


2 A48836 


fibropellin C precurs 


1.13e-81 


6 


557 


10.2 


2703 


2 A24420 


notch protein • fruit 


2.80e-81 


7 


548 


10.1 


2437 


2 S42612 


transmembrane protein 


1.67e-79 


8 


538 


9.9 


861 


2 A48825 


Notch homolog Motch p 


l,56e-77 


9 


540 


9.9 


1203 


2 A49175 


Motch B protein - mou 


6.30e-78 


10 


540 


9.9 


2524 


2 A35844 


Xotch protein - Afric 


6.30e-78 


11 


533 


9.8 


728 


2 150719 


C-Delta-1 ■ chicken 


1.50e-76 


12 


527 


9.7 


722 


2 148324 


DELTA-like 1 - mouse 


2,27e-75 


13 


529 


9.7 


2139 


2 A35672 


crumbs protein - frui 


9,19e-76 


14 


527 


9,7 


2471 


2 A49128 


cell-fate determining 


2,27e-75 


15 


520 


9.6 


832 


2 A31246 


neurogenic protein De 


5.38e-74 


16 


521 


9.6 


833 


2 S19087 


gene Delta protein pr 


3.42e-74 


17 


520 


9.6 


880 


2 S00670 


gene Delta protein pr 


5.38e-74 


18 


520 


9.6 


2531 


2 S18188 


notch protein homolog 


5.38e-74 


19 


518 


9.5 


2531 


2 A46019 


gene Notch -1 protein 


1.33e-73 


20 


505 


9.3 


2318 


2 S45306 


notch 3 protein - mou 


4.67e-71 


21 


508 


9.3 


2555 


2 A40043 


notch protein homolog 


1.21e-71 


22 


486 


8.9 


2321 


2 S78549 


notch3 protein - huma 


2.39e-67 


23 


472 


8.7 


293 


2 B26637 


neurogenic repetitive 


1.26e-64 



24 


. 474 8.7 


1220 


2 


A56136 


jagged protein precur 


5,16e-65 


25 


434 8.0 


1429 


2 


S06434 


homeotic protein lin- 


2,79e-57 


26 


417 7.7 


383 


2 


B45484 


delta-like dlk homeot 


5.08e-54 


27 


417 7.7 


383 


2 


S53716 


homeotic protein dlk 


5,08e*54 


28 


416 7.6 


259 


2 


S48713 


fetal antigen 1 - hum 


7.90e-54 


29 


412 7.6 


1404 


2 


A36666 


serrate protein precu 


4.59e-53 


30 


412 7.6 


1408 


2 


S16148 


gene serrate protein 


4,59e-53 


31 


406 7.5 


260 


2 


A44549 


fetal antigen 1 homeo 


6.42e-52 


32 


401 7.4 


385 


2 


S53718 


homeotic protein dlk 


5,75e-51 


33 


391 7,2 


385 


2 


A54785 


preadipocyte factor 1 


4,57e-49 


34 


391 7.2 


1295 


2 


A32901 


glpl protein precurso 


4,57e-49 


35 


384 7,1 


387 


2 


B49175 


Motch A protein • mou 


9,69e-48 


36 


380 7.0 


200 


2 


A26637 


neurogenic repetitive 


5.53e-47 


37 


380 7,0 


473 


2 


A56175 


adhesive plaque prote 


5,53e-47 


38 


318 5,8 


5147 


1 


IJFFTM 


cadherin -related tumo 


2,02e-35 


39 


307 5.6 


102 


2 


B55885 


chondroitin sulfate p 


2.09e-33 


40 


304 5.6 


2397 


2 


A55535 


versican precursor - 


7.35e-33 


41 


307 5,6 


2409 


2 


A60979 


versican precursor - 


2.09e-33 


42 


300 5.5 


862 


2 


S43922 


versican - pig-tailed 


3.93e-32 


43 


292 5.4 


3562 


2 


A47171 


chondroitin sulfate p 


l.lle-30 


44 


286 5.3 


1091 


2 


A58532 


glial cell membrane g 


1.34e-29 


45 


285 5.2 


886 


2 


A57172 


probable hormone rece 


2.03e-29 



RESULT 



1 



TITLE 



ORGANISM 
DATE 



ACCESSIONS 



fauthors 

fjournal 
•title 



B36665 ttype complete 
slit protein 2 precursor • fruit fly (Drosophila 

melanogaster) 
tformaljiame Drosophila melanogaster 
30-Apr-1991 tsequencejrevision 30-Apr-1991 ttext change 

16-Dec-1998 
B36665 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S, 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross-references MUID:91099665 
♦accession B36665 

ftstatus preliminary 
ttmolecule_type mRNA 
ttresidues 1-1469 ttlabel ROT 
ftcross -references GB:X53959 
GENETICS 

tgene FlyBaseisli 

ftcross -references FlyBase:FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha -2 -glycoprotein repeat 
homology; proteoglycan carboxyl -terminal homology 

FEATURE 



66-91 


tdomain proteoglycan amino-terminal homology tlabel 




PAH1\ 


101-124 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology f label LRR1\ 


125-148 


fdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR2\ 


149-172 


(tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR3\ 


173-196 


fdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR4\ 


197-220 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


228-272 


fdomain proteoglycan carboxyl -terminal homology tlabel 




PCS1\ 


288-313 


tdomain proteoglycan amino-terminal homology tlabel 




PAH2\ 


323-346 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR6\ 


347-370 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
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homolo9y tlabel LRR7\ 


371*394 


fdoniain leucine-rich dlphci • 2 ■ glycoprotein repeat 




homology t label LRR8\ 


395-418 


tdomain leucine*rich alpha-2-glycoprotein repeat 




hnmnlnnv ilahol T.PPQV 


419-442 


ttUUlllulll ICULIUC IlUil aipila L yiyLUpiUlvlll IcPGaL 




hnmnlnov Hiatal r,Rin\ 


450*494 


tdomain proteoglycan carboxyl -terminal homology tlabel 




PCS2\ 


512*537 


tdomain proteoglycan amino-terminal homology tlabel 




PAH3\ 


547-571 


tdomain leucine*rich alpha*2*glycoprotein repeat 




homology tlabel LR11\ 


572-595 


tdomain leucine-rich alpha-2-glycoprotein repeat 




iiUHiuJiuyy ffiduci uiu4\ 


596-619 


tdomain leucine-rich alphci * 2 "glycoprotein repeat 




hnmnlnmr *1aVia1 * D11\ 

iiuiiKjiuyy viaoci uKU\ 


620*643 


tdomain leucine-rich alpha*2*glycoprotein repeat 




JlUMUlUyy Dlaucl uiU4\ 


651*695 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS3\ 


f 08-733 


tdomain proteoglycan amino-terminal homology tlabel 




PAH4\ 


743-766 


tdomain leucine-rich alpha-2*glycoprotein repeat 




homology tlabel LR15\ 


767-790 


tdomain leucine-rich alpha*2-glycoprotein repeat 




homology tlabel LR16\ 


846*890 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS4\ 


1028*1061 


tdomain EGF homology tlabel EGF 



SUMMARY f length 1469 tmolecular-weight 164695 tchecksum 8361 



Query Match 34.8%; Score 1894; DB 2; Length 1469; 

Best Local Similarity 40.7*; Pred. No. 0.00e+00; 

Matches 307; Conservative 159; Mismatches 240; Indels 49; Gaps 28; 



Db 


729 


SRNQLKEIPRGIPAETSELYLESNEIEQIHYERIRHLRSLTRLDLSNNQITILSNYTFAN 
1:1 :l II :|:llll::l |::| : :| lll:||||:|:: 1 1 l|:| 
SNRNLTSFPSRIPFDTTELYLDANYINEIPAHDLNRLYSLTKLDLSHNRLISLENNTFSN 


788 


oy 


l 


60 


Db 

oy 


789 
61 


LTKLSTLIISYNRLQCLQRHALSGLNNLRWSLHGNRISMLPEGSFEDLKSLTHIALGSN 
Mlllllllllhll ::||! |;:|||l! 1 l|:::| :| MIIMII 
LTRLSTLIISYNRLRCLQPLAFNGLNALRILSLHGNDISFLPQSAFSNLTSITHIAVGSN 


848 
120 


Db 


849 


PLYCDCGLKWFSDWIKLDYVEPGIARCAEPEQMKDKLILSTPSSSFVCRGRVRNDILAKC 
:|illl : III III ::|:||||| |: : : |:|:: : I I ::| : :|| 
SLYCDCNMAWFSKWIKSKFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTKLATKC 


908 


Qy 


121 


180 


Db 


909 


NACFEQPCQNQAQCVALPQREYQCLCQPGYHGKHCEFMIDACYGNPCRNNATCTVLEEGR 
: h: II 1 1 1 1 1 1 1 II: 1 III lllllhll Mill 1 : II 
DLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 


968 


i 


181 


240 


Db 


969 


FSCQCAPGYTGARCETNIDDCL-G- -EI --KCQNNAT-C- - -I-D-G- "VESYKCECQP 

h! 1 1: 1 II Mill: : 1 II : 1 : : : ;:||;|:| 
FNCYCNRGFEGDYCERNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPM 


1014 


oy 


241 


300 


Db 


1015 


GFSGEFCDTRIQFCSPEFNPCANGAKCMDHFTHYSCDCQAGFHGTNCTDNIDDCQNHMCQ 
: 1 1: 1-1: :lll 1 :||: III 1 :M 1 II Mill 1 II 
EYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQ 


1074 


Qy 


301 


360 


Db 


1075 


NGGTCVDGIKDYOCRCPDDYIGKYCEGHNMISMMYPQTSPCQNHECKHGVCFQPNAQGSD 
11:11111 1:1 1 1:1 III Ml I :|| Ml :: :|| 
NGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECV-ASQNSSD 


1134 


Qy 


361 


419 


Db 


1135 


YLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTRPEANVTIVFSSAEQNGILMYDGQ 
: 1:11 |::| h |::| : ::: |:|| : ::: :|: : :: |||:| |: 
FTCRCHEGFSGPSCDRQMSVGFKNPGAYLALDPL-A-SDGTITMTLRTTSRIGILLYYGD 


1194 


Qy 


420 


477 


Db 


1195 


DAHLAVELFKGRIRVSYDVGNHPVSTMYSFEMVADGKYHAVEL-LAIKKNFTLRVDRGLA 
1 :: M::||::: Mill III :| II 1 : : : :| | |::|: 
DHFVSAELYDGRVKLVYYIGNFPASHMYSSVKVNDGLPHRISIRTSERKCF-LQIDKNPV 


1253 


Qy 


478 


536 


Db 


1254 


RSI INEGSNDYLKLT * TPM* FLGGLPVDPAOQAYKNWQIRNLTSFKGCMKEWINHKLVD 


1311 



: : I M I I ::||M:: :|:| : :::| |:|||: : II I 
Qy 537 QIVENSGKSDQLITKGKEMLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINE-VP 594 

Db 1312 FGNAQRQQKITPGCALLEGEQQEEEDDEQDFMDETPHIREEPVDPCLENRCRRGSRCVPN 1371 

: I II II I: : II I :: MM Ml 
Qy 595 I-NLQ-Q AL ENVNTEQSC - SATVNFCAG - ID - C - GN — G * RCTNN 630 

Db 1372 SNARDGYQCKCKHGQRGRYCDQAASTCRKEQVREYYTENDCRSRQPLKYAKCVGGCG-NQ 1430 

: IN Ml: MM 11:111 : :| I I I II :| 
Qy 631 ALSPKGYMCQCDSHFSGEHCDEKRIRCDKQKFRRHHIENECRSVDRIRIAECNGYCGGEQ 690 

Db 1431 -CCAAKIVRRRKVRMVCSNNRKYIKNLDIVRRCGC 1464 

11:1 ::|lhl:| I M |:| I I 
Qy 691 NCCTAVKKKQRKVRMICKNGTTRISTVHIIRQCQC 725 



RESULT 2 

ENTRY A36665 ttype complete 

TITLE slit protein 1 precursor • fruit fly (Drosophila 

melanogaster) 

ORGANISM tformal_name Drosophila melanogaster 

DATE 30-Apr-1991 tsequence_revision 30-Apr-1991 ttext_change 

24*Sep-1998 
ACCESSIONS A36665; S13523 
REFERENCE A36665 

tauthors Rothberg, J.M.; Jacobs, J,R,; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
ijournal Genes Dev. (1990) 4:2169-2187 

ttitle slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross -references MOID: 91099665 
taccession A36665 

ttstatus preliminary 
##molecule_type mRNA 
ttresidues 1-1480 I tlabel ROT 
ttcross-references GB;X53959; NID:g8614; PiD:g8615 
GENETICS 

tgene FlyBase:sli 
ttcross-references FlyBase : FBgn00034 25 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha* 2 -glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 



KEYWORDS 


alternative splicing 


FEATURE 




66-91 


tdomain proteoglycan amino-terminal homology tlabel 




PAH1\ 


101-124 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LRR1\ 


125-148 


tdomain leucine-rich alpha*2-glycoprotein repeat 




homology tlabel LRR2\ 


149*172 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR3\ 


173-196 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology tlabel LRR4\ 


197-220 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


228*272 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS1\ 


288-313 


tdomain proteoglycan amino-terminal homology tlabel 




PAH2\ 


323-346 


tdomain leucine-rich alpha-2-glycoprotein repeat ' 




homology tlabel LRR6\ 


347-370 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR7\ 


•371*394 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology tlabel LRR8\ 


395-418 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology tlabel LRR9\ 


419*442 


tdomain leucine-rich alpha - 2 - glycoprote in repeat 




homology tlabel LR10\ 


450-494 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS2\ 
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512-537 


((domain proteoglycan amino-terminal homology t label 




PAH3\ 


547-571 


tdomain leucine-rich alpha - 2 - gly coprote in repeat 




homology flabel LR11\ 


572-595 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology t label LR12\ 


596-619 


((domain leucine-rich alpha-2-glycoprotein repeat 




homology ilabel LR13\ 


620-643 


((domain leucine-rich alpha - 2 - glycoprotein repeat 


651-695 


homology flabel LR14\ 


tdomain proteoglycan carboxyl- terminal homology tlabel 




PCS3\ 


708-733 


tdomain proteoglycan amino-terminal homology Ilabel 




PAH4\ 


743-766 


tdomain leucine-rich alpha- 2 - glycoprotein repeat 




homology ilabel LR15\ 


767-790 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 




homology tlabel LR16\ 


791-814 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR17\ 


815-838 


tdomain leucine-rich alpha -2 -glycoprotein repeat 


846-890 


homology ilabel LR18\ 


tdomain proteoglycan carboxyl-terminal homology ilabel 




PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



SUMMARY 



♦length 1480 tmolecular-weight 165751 ♦checksum 900 



Query Match 33.8%; Score 1838; DB 2; Length 1480; 

Best Local Similarity 41.8*; Pred, No. 0.00e+00; 

Matches 257; Conservative 143; Mismatches 194; Indels 21; Gaps 15; 

Db 729 SRNQLREIPRGIPAETSELYLESNEIEQIHYERIRHLRSLTRLDLSNNQITILSNYTFAN 788 

1:1:111 :|:||||:;| |::| : :| Mhll!l:|:: I I Ihl 
Qy 1 SNKNLTSFPSRIPFDTTELYLDANYINEIPAHDLNRLYSLTRLDLSHNRLISLENNTFSN 60 

Db 789 LTKLSTLIISYNKLQCLQRHALSGLNNLRWSLHGNRISMLPEGSFEDLKSLTHIALGSN 848 

IhllllllMllhlll |::||| 1 1 ||:::| :| |:||||:||| 

Qy 61 LTRLSTLIISYNRLRCLQPLAFNGLNALRILSLHGNDISFLPQSAFSNLTSITHIAVGSN 120 

Db 849 PLYCDCGLKWFSDWIKLDYVEPGIARCAEPEQMRDRLILSTPSSSFVCRGRVRNDILAKC 908 

illlll : III III ::|:||||| h : : hh: : | | ::| ; :|| 
Qy 121 SLYCDCNMAWFSKWIKSRFIEAGIARCEYPNTVSNQLLLTAQPYQFTCDSKVPTRLATRC 180 

Db 909 NACFEQPCQNQAQCVALPQREYQCLCQPGYHGKHCEFMIDACYGNPCRNNATCTVLEEGR 968 

• : !:: II I I I : : I I I I II: I II lllllhll Ml I : 
131 DLCLNSPCKNNAICETTSSRRYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 240 

Db 969 FSCQCAPGYTGARCETNIDDCL-G-EI--KCQNNAT-C---I-D--G-VESYKCECQP 1014 

hi I I: I II Mill: : I II : I : : : ::||;|;| 
Qy 241 FNCYCNKGFEGDYCEKNIDDCVNSRCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPM 300 

Db 1015 GFSGEFCDT K IQFC SPEFNPCANGARCMDHFTHY SCDCQAGFHGTHCTDNIDDCQNHMCQ 1074 

: I I: :::!: :lll I :||: III I :|| I II III I 
Cy 301 EYEGRHCEDKLEYCTKRLNPCENNGRCIPINGSYSCMCSPGFTGNNCETNIDDCRNVECQ 360 

Db 1075 NGGTCVDGINDYQCRCPDDYTGRYCEGHNMISMMYPQTSPCQNHECKHGVCFQPNAQGSD 1134 

Mhlllll 1:1 I hi III I :| I :| 

Qy 361 NGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECV-ASQNSSD 419 

Db 1135 YIOCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTRPEANVTIVFSSAEQNGILMYDGQ 1194 

: hll h = l h |::| : :: : ::: :|: : :: |||:| |: 

Qy 420 FTCKCHEGFSGPSCDRQMSVGFKNPGAYLALDPL-A-SDGTITMTLRTTSKIGILLYYGD 477 

Db 1195 DAHLAVELFNGRIRVSYDVGNHPVSTMYSFEMVADGKYHAVEL-LAIKKNFTLRVDRGLA 1253 

I :: !h:lh:: I :|| I I III :| || | : : : :| | |::|: 
Qy 478 DHFVSAELYDGRVRLVYYIGNFPASHMYSSVKVNDGLPHRISIRTSERKCF-LQIDRNPV 536 

Db 1254 RSIINEGSNDYLKLI-IPM-FLGGLPVDPAQQAYKNWQIRNLTSFKGCMKEVWINHKLVD 1311 

: : I I :| I I ::||||:: :|:| : :::| hllh : II :: 
Qy 537 QIVENSGKSDQLITKGKEMLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINEVPIN 596 

Db 1312 FGNAQRQQKITPGCA 1326 

- : I : :h 



Qy 597 LQQALENVNTEQSCS 611 



ENTRY 
TITLE 



3 

A31640 ttype fragment 

epidermal growth factor-like protein slit - fruit fly 
(Drosophila melanogaster) (fragment) 
ORGANISM tformaljiame Drosophila melanogaster 

DATE 28-Feb-1990 tsequence_revision 28-Feb-1990 ttext change 

14-Aug-1998 
A31640 
A31640 

tauthors Rothberg, J.M.; Hartley, D.A.; Walther, Z.; 

Artavanis-Tsakonas, S, 
♦journal Cell (1988) 55:1047-1059 

ttitle slit: An EGF-homologous locus of D. melanogaster involved in 

the development of the embryonic central nervous system, 
♦cross-references MDID : 89077533 
♦accession A3 1640 
##molecule_type DNA 
♦♦residues 1-530 itlabel ROT 
♦♦cross-references GB:M23543; NID:g340939; PID:g514357 
GENETICS 

♦gene FlyBase:sli 

♦tcross -references FlyBase : FBgn0003425 
♦introns 470/3 
CLASSIFICATION fsuperfantily EGF homology 
KEYWORDS growth -factor 

FEATURE 

148-181 tdomain EGF homology tlabel EGF 

SUMMARY ilength 530 tchecksum 6330 

Query Match 21,8%; Score 1188; DB 2; Length 530; 

Best Local Similarity 36.94; Pred, No. 9.23e-210; 

Matches 171; Conservative 111; Mismatches 159; Indels 22; Gaps 16; 

Db 1 MKDKLILSTPSSSFVCRGRVRNDILAKCNACFEQPCQNQAQCVALPQREYQCLCQPGYHG 60 

: : hh: : I I ::| ::||: h : I II I I : : I I I I 1 1 : I 
Qy 153 VSNQLLLTAQPYQFTCDSKVPTKLATRCDLCLNSPCKNNAICETTSSRKYTCNCTPGFYG 212 

Db 61 KHCEFMIDACYGNPCRNNATCTVLEEGRFSCQCAPGYTGARCETNIDDCL-G--EI--KC 115 

III Nlllhll Mill I : M: | |: | || HIM: : | | 
Qy 213 VHCENQIDACYGSPCLNNATCKVAQAGRFNCYCNKGFEGDYCERNIDDCVNSKCENGGKC 272 

Db 116 QNNAT-C---I-D-G-VESYKCECQPGFSGEFCDTKIQFCSPEFNPCANGAKCMDHFT 166 

: I : : : ::|hhl : I |: |:::|: :||| I :|h 
Qy 273 VDLVRFCSEELRNFQSFQINSYRCDCPMEYEGRHCEDRLEYCTKKLNPCENNGKCIPING 332 

Db 167 HYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGRYCEGHNMIS 226 

III I HI I II Mill I 11111:1111 hi I |:| III |: 
Qy 333 SYSCMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMD 392 

Db 227 MMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGRWCEYLTSISFVHNNSFVELE 286 

IIIMI I :| I :: :lh hll |::| h |::| : ::: |: 
Qy 393 MEYQKTDACQQSACGQGECV-ASQNSSDFTCRCHEGFSGPSCDRQMSVGFKNPGAYLALD 451 

Db 287 PLRTRPEANVTIVF'SSGQNGILMYDGQDAHIAVELFNGRIRVSYDVGNHPVSTMYSFEM 345 

II : =:: :h : ::: llhl hi :: l|::||::: I :|| I I III : 

Qy 452 PL - A- SDGT ITMTLRTTS KIG ILLYYGDDHFVS AELYDGRVKLVY Y IGNFPASHMYSSVK 509 

Db 346 VADGKYHAVEL - LAIKKNFTLRVDRGLARS I INEGSNDYLRLT - TPM- FLGGLPVDPAQQ 402 

III I : : : :| I |::|: : : I I :| I I ::||||:: :|: 

Qy 510 VNDGLPHRISIRTSERRCF-LQIDRNPVQIVENSGRSDQLITRGKEMLYIGGLPIEKSQD 568 

Db 403 AY KNWQ I RNLT SFKGCMKEVW I NH KLVDFG NAQ RQQK I T PGC A 445 

I : :::| hllh : II ::: I : :|: 
Qy 569 ARRRFHVKNSESLKGCISSITINEVPINLQQALENVNTEQSCS 611 



RESULT 4 

ENTRY A40136 ttype complete 

TITLE fibropellin la • sea urchin (Strongylocentrotus purpuratus) 

ALTERNATE.NAMES epidermal growth factor homolog precursor 
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CONTAINS alternatively spliced fibropellin lb (EGFI) 

ORGANISM t formal jiame Strongylocentrotus purpuratus tcommonjiame 

purple urchin 

DATE 13-May-1992 tsequence_revision 17-Sep-1997 ttext.change 

07-Aug-1998 

ACCESSIONS A40136; B40136; C40136; A29316; A43131 
REFERENCE A40136 

tauthors Delgadillo-Reynoso, M.G.; Rollo, D,R,; Hursh, D.A, ; Raff, 
R.A. 

tjournal J. Mol. Evol. (1989) 29:314-327 

♦title Structural analysis of the uEGF gene in the sea urchin 

Strongylocentrotus purpuratus reveals more similarity to 
vertebrate than to invertebrate genes with EGF-like 
repeats. 

tcross-references MUID:90112459 
taccession A40136 

ftstatus preliminary 

ttmolecule.type mRNA 

itresidues 1-114 filabel del 

(ttcross-references GB:X17530; NID:gl0225; PID:g667061 

•Accession B40136 
! fit status preliminary; not compared with conceptual translation 
f«molecule_type DNA 

Itresidues 181-251,329-370, ' R' , 372-408, ' RA' ,411-441 ttlabel DE2 
taccession C40136 

it status preliminary; not compared with conceptual translation 
♦imolecule_type DNA 

itresidues 'K', 747-821, 898-978 ttlabel DE3 
REFERENCE A29316 

tauthors Hursh, D.A. ; Andrews, M.E.; Raff, R.A. 

tjournal Science (1987) 237:1487-1490 

ttitle A sea urchin gene encodes a polypeptide homologous to 

epidermal growth factor, 
tcross-references MUID: 87319677 
taccession A29316 

itstatus preliminary 

#fmolecule_type mRNA 

itresidues ' S' , 280-481, 786-1064 HUR 
itcross-references GB:M17421; NID:gl61474; PID:g552260 
REFERENCE A43131 

tauthors Hunt, L.T.; Barker, W.C. 

tjournal FASEB J. (1989) 3:1760-1764 

ttitle Avidin-like domain in an epidermal growth factor homolog from 

a sea urchin, 
tcross-references MUID: 89196806 
tcontents annotation 
COMMENT EGF homology repeats 10-17 are spliced out in the short form 
(fibropellin lb). 

CLASSIFICATION fsuperfamily Clr/Cls repeat homology; EGF homology 

Hi-19 tdomain signal sequence tstatus predicted tlabel SIG\ 

W20-1064 tproduct fibropellin I tstatus predicted tlabelTlB\ 

23-54 tdomain EGF homology tlabel EG01\ 

57-175 tdomain Clr/Cls repeat homology tlabel CSR\ 

180-211 tdomain EGF homology tlabel EG02\ 

218-249 tdomain EGF homology tlabel EG03\ 

256-287 tdomain EGF homology tlabel EG04\ 

294-325 tdomain EGF homology tlabel EG05\ 

332-363 tdomain EGF homology tlabel EG06\ 

370-401 tdomain EGF homology tlabel EG07\ 

408-439 tdomain EGF homology tlabel EG08\ 

446-477 tdomain EGF homology tlabel EG09\ 

484-515 tdomain EGF homology tlabel EG10\ 

522-553 tdomain EGF homology tlabel EG11\ 

560-591 tdomain EGF homology tlabel EG12\ 

598-629 tdomain EGF homology tlabel EG13\ 

636-667 tdomain EGF homology tlabel EG14\ 

674-705 tdomain EGF homology tlabel EG15\ 

712-743 tdomain EGF homology tlabel EG16\ 

750-781 tdomain EGF homology tlabel EG17\ 

788-819 tdomain EGF homology tlabel EG18\ 

826-857 tdomain EGF homology tlabel EG19\ 

864-895 tdomain EGF homology tlabel EG20\ 



902-933 
936-1064 

23-34,28-43,45-54, 
62-88,180-191, 



tdomain EGF homology tlabel EG21\ 
fregion avidin-like\ 





200 202 




218 




238 


240 


249,256 


267 






?!! 


294 


one nnn 
305,299 


314 


316 




343 








inn 
370 


381,375 


inn 

390 




401,408 


\\l 


413 


428 430 








466 


468 


477 484 


495 




cfii cne 




«o 


533,527 


542 


544 


553,560 


571 


565 


con coo 


591 


598 


609,603 


618 


620 


629,636 




641 


656,658 


667 


674 


685,679 


694 


696 


705,712 


723 


717 


732,734 


743 


750 


761,755 


770 


772 


781,788 


799 


793 


808,810 


819 


826 


837,831 


846 


'848 


857,864 


875 


869 


884,886 


895 


902 


913,907 


922 


924 


933 




SUMMARY . 


f 



tdisulfidejtionds tstatus predicted\ 



tdisulfide_bonds tstatus predicted 



Query Match 10.8%; Score 588; DB 2; Length 1064; 

Best Local Similarity 39.7%; Pred, No, 2.04e-87; 



Matches 


Db 


216 


Qy 


181 


Db 


274 


Qy 


241 


Db 


319 


Qy 


301 


Db 


377 


Qy 


361 


Db 


427 


Qy 


421 



II : II I : I 



I III III I III I: I : M M I : I I 



hi I II I II Ml: :: Nil 



I : I: : II I I I: 



-I--C--I-D-G-INGYTCSCPL 318 

: I : : : IN I II: 



:| l:hll II I II: : 



III |:||: : I |:|||:| || :| I : :|| : | hi : |: 



I I III :|: 



RESULT 
ENTRY 
TITLE 



5 



A48836 ttype complete 

fibropellin C precursor ■ sea urchin (Strongylocentrotus 
purpuratus) 

ALTERNATEJAMES EGF repeat-containing protein; epidermal growth 
factor-related protein 3; fibropellin in 

ORGANISM tformal_name Strongylocentrotus purpuratus tcommonjame 

purple urchin 

DATE 01-Dec-1993 tsequence_revision 18-Nov-1994 ftext change 

07-Aug-1998 
ACCESSIONS A48836 
A48836 
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iauthors Bisgrove, B.W.; Raff, R.A. 
♦journal Dev. Biol. (1993) 157:526-538 

•title The SpEGF III gene encodes a member of the fibropellins: EGF 
repeat-containing proteins that form the apical lamina of 
the sea urchin embryo, 
(cross-references MUID : 93273088 
♦accession A48836 

♦♦status preliminary 
llmolecule.type mRNA 
♦♦residues 1-570 ♦♦label BIS 
♦♦cross-references GB:L07045; NID:g310659; PID:g310660 
♦♦note sequence extracted from NCBI backbone (NCBIN: 132724 , 

NCBIP: 132725) 

CLASSIFICATION fsuperfamily Clr/Cls repeat homology; EGF homology 
FEATURE 

1-18 fdomain signal sequence ♦status predicted ♦label SIG\ 

• 19-570 ♦product fibropellin C tstatus predicted ♦label FIB\ 

19-54 ♦domain EGF homology I label EGF1\ 

5M75 idomain Clr/Cls repeat homology (label ClR\ 

176-211 Idomain EGF homology (label EGF2\ 

214-249 Idomain EGF homology (label EGF3\ 

252-287 Idomain EGF homology llabel EGF4\ 

290-325 Idomain EGF homology (label EGF5\ 

328-363 Idomain EGF homology (label EGF6\ 

366-401 fdomain EGF homology tlabel EGF7\ 

404-439 ♦domain EGF homology tlabel EGF8\ 

442-570 (region avidin-like\ 

23-34,28-43,45-54, 
62-88,180-191, 
185-200,202-211, 
218-229,223-238, 
240-249,256-267, 
261-276,278-287, 
294-305,299-314, 
316-325,332-343, 
337-352,354-363, 
370-381,375-390, 
392-401,408-419, 

413-428,430-439 ♦disulfidejonds (status predicted 
SUMMARY (length 570 (molecular -weight 61115 (checksum 5567 

Query Match 10.3%; Score 559; DB 2; Length 570; 

Best Local Similarity 42.2*; Pred. No. 1.13e-81; 



Matches 


I 


256 


Qy 


183 


Db 


314 


Qy 


243 


Db 


359 


Qy 


303 


Db 


417 


Qy 


363 



I : II I :|| 



:| I Ih I: II M I :IM I ::| I j: 



II II I II ll::l :| Mill I 



■-G-VDGYVCQCLPNY 358 

: :::| |:| :| 



III \- I: I 1 1 : 1 II Mill Mil III :|::| :: II 



I IMI I I II Ml II 



RESULT 6 

ENTRY A24420 (type complete 

TITLE notch protein - fruit fly (Drosophila melanogaster) 

ALTERNATEJAMES neurogenic repetitive locus protein 

ORGANISM (formaljiame Drosophila melanogaster 

DATE 30-Jun-1987 (sequence_revision 30-Jun-1987 (text change 
07-Aug-1998 

ACCESSIONS A24420; A24768; S09358; A05267 

REFERENCE A24420 

(authors Kidd, S,; Relley, M.R.; Young, M.W. 

♦journal Mol. Cell. Biol. (1986) 6:3094-3108 



(cross-references MUID: 87064624 
(accession A24420 
(tmolecule.type DNA 
(♦residues 1-2703 ((label KID 
(♦cross-references GB:K03508; NlD:gl57991; PID:gl57993 
?ENCE A24768 

♦authors Wharton, K.A.; Johansen, K.M.; Xu, T.; Artavanis-Tsakonas, S. 
♦journal Cell (1985) 43:567-581 
♦cross-references MUID: 86079539 
. ♦accession A24768 
(Imolecule.type mRNA 

♦(residues 1-48, T, 50-118, 'R\ 120-230, 'I', 232-256, T, 258-266, 'A', 
268-872, 'R' ,874 -958, 'R', 960-1970, 'FH', 1973-2256, 'G\ 
2258-2264, 'V, 2266-2406, 'R', 2408-2444, 'L', 2446-2703 
♦(label WHAl 

♦♦note the authors translated the codon ATC for residue 49 as 

Thr, ATT for residue 2044 as Arg, GTA for residue 2265 
as Ala, CGC for residue 2407 as His, and CTT for 
residue 2445 as Arg 

REFERENCE S09358 
♦authors Tautz, D. 

♦journal Nucleic Acids Res. (1989) 17:6463-6471 

♦title Hypervariability of simple sequences as a general source for 

polymorphic DNA markers . 
♦cross-references MUID:89385974 
♦accession S09358 
♦♦molecule.type DNA 

♦♦residues 2505-2551, 'QQQQ' ,2552-2576, 'E' ,2578-2604 ((label TAU 
REFERENCE A05267 

(authors Wharton, K.A.; Yedvobnick, B.; Finnerty, V.G.; 

Artavanis-Tsakonas, S. 
(journal Cell (1985) 40:55-62 

(title opa: a novel family of transcribed repeats shared by the 

Notch locus and other developmental^ regulated loci in D. 
melanogaster, 

♦cross-references MUID: 85099329 

♦accession A05267 
♦(molecule_type DNA 

(♦residues 2504-2576, 'E', 2578-2611 ♦♦label WHA2 
GENETICS 

♦gene notch; opa 

(♦cross-references FlyBase:FBgn0004647 
♦mapjosition 8.96-9.36 

tintrons 53/3; 84/3; 171/3; 240/3; 283/3; 2333/3; 2436/3; 2588/3 
CLASSIFICATION (superfamily notch protein; ankyrin repeat homology; EGF 
homology 

KEYWORDS differentiation; tandem repeat; transmembrane protein 

FEATURE 

27-43 (domain transmembrane ♦status predicted llabel TMMl\ 

568-599 idomain EGF homology ♦label EGF\ 

1746-1762 ♦domain transmembrane ♦status predicted (label TMM2\ 

1950-1982 ♦domain ankyrin repeat homology tlabel AN1\ 

1983-2015 (domain ankyrin repeat homology tlabel AN2\ 

1988-2004 Idomain transmembrane ♦status predicted llabel TMM3\ 

2017-2049 Idomain ankyrin repeat homology (label AN3\ 

2050-2082 (domain ankyrin repeat homology llabel AN4\ 

2083-2115 Idomain ankyrin repeat homology (label AN5\ 

2538-2568 (region glutamine-rich\ 

2538-2568 (domain neurogenic repetitive element ♦status predicted 

♦label OPA 

SUMMARY (length 2703 (molecular-weight 288876 (checksum 6404 

Query Match 10.2%; Score 557; DB 2; Length 2703; 

Best Local Similarity 38.14; Pred. No. 2.80e-81; 

Matches 96; Conservative 49; Mismatches 80; Indels 27; Gaps 17; 

Db 681 CHSNPCNNGATC-IDGINSYKCQCVPGFTGQHCEKNVDECISSPCANNGVC-IDQVNGYK 738 

I MM M : I II III I III: :| I :||| ||:| : | :: 
Qy 183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFN 242 

Db 739 CECPRGFYDAHCLSDVDECA-S--NP--CVNEGR-CEDGI-N-E-F-I----CHCPPGY 783 

I Mil I M: | | ||: | I; M M I Ml I 
Qy 243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEY 302 
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Db 784 TGKRCELDIDECSS- -NPCQHGGTCYDKLNAFSCQCMPGYTGQKCETNIDDCVTKPCGNG 841 

11:11 - \- HI" I I -ll I Ihll : 1 1 1 1 1 1 ! I I II 
Qy 303 EGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNKCETNIDDCKNVECQNG 362 

Db 842 GTCIDKVNG YKCVCKVPFTGRDCE - - SKMD - PCA - SNRCK - N — EAKCT PSSNFLDFSC 893 

1:1:1 : :| |:|: ::|: II ::|| :: I : :: I :| I l|:| 
Qy 363 GSCVDGILSYDCLCRPGYAGOYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTC 422 

Db 894 TCKLGYTGRYCD 905 

I I::| II 
Qy 423 KCHEGFSGPSCD 434 



RESULT 7 

ENTRY S42612 ftype complete 

TITLE r transmembrane protein precursor - zebra fish 
ORGANISM ' tformaljiame Brachydanio rerio tcomntonjiame zebra fish 
DATE 20-Feb-1995 isequence revision 20-Feb-1995 ftext change 

10-Jul-1998 
ACCESSIONS S42612 
AeRENCE S42612 

^■ftauthors Bierkamp, C; Campos -Ortega, J. A. 
^tjournal Mech. Dev. (1993) 43:87-100 

ttitle A zebrafish homologue of the Drosophila neurogenic gene Notch 
and its pattern of transcription during early 
embryogenesis . 
Icross-references MUID:94128602 
taccession S42612 
ttstatus preliminary 
ttmolecule.type mRNA 
ttresidues 1-2437 ttlabel BIE 
tfcross-references EMBL:X69088; NID:g433866; PID:g433867 
■ CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1915-1947 tdomain ankyrin repeat homology ilabel AN1\ 

1948-1980 tdomain ankyrin repeat homology tlabel AN2\ 

1982-2014 tdomain ankyrin repeat homology tlabel AN3\ 

2015-2047 tdomain ankyrin repeat homology tlabel AN4\ 

2048-2080 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY ilength 2437 tmolecular -weight 262306 tchecksum 4021 

Query Match 10.1%; Score 548; DB 2; Length 2437; 

Best Local Similarity 39.6%; Pred. No. 1.67e-79; 



I I ::|||| III:: lllhlllll hill :|: I :IM : : |: : h 



III hi II 1 1 : : I : : I INI I I 



Matches 


Db 


528 


• 


181 




585 


Qy 


241 


Db 


630 


Qy 


301 


Db 


686 


Qy 


360 


Db 


735 


Qy 


420 



I h :lh III! .Ill I h! Ih:|: h Mill I 



HNGGTCIDGVNSFTCLC-PD-G-FRDATCL-S-Q-HN-E-CSSNPCIHGSCLD-QINS- 734 

:IH:|:||: h III I I : : : : :: : I ::| :| h I :| 



RESULT 

ENTRY 

TITLE 

ORGANISM 

DATE 



A48825 ttype fragment 
Notch homolog Motch protein - mouse (fragment) 
tformaljiame Mus musculus tcommonjiame house mouse 
01-Dec-1993 tsequence.revision 18-Nov-1994 ttext„change 



14-Aug-1998 
ACCESSIONS A48825 
REFERENCE A4B825 

fauthors Reaume, A.G.; Conlon, R.A.; Zirngibl, R. ; Yamaguchi, T.P.; 

Rossant, J. 

fjournal Dev. Biol. (1992) 154:377-387 

ttitle , Expression analysis of a Notch homologue in the mouse embryo, 
tcross -references MOID: 93050801 
taccession A48825 

ttstatus preliminary; not compared with conceptual translation 

itmolecule_type mRNA 

^residues 1-861 Mabel REA 

ttexperimental_source embryo 

tltnote sequence extracted from NCBI backbone (NCBIP:119144) 

CLASSIFICATION Ssuper family unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 



tdomain EGF homology tlabel EGF 
tlength 861 tchecksum 7963 



26-57 
SUMMARY 



Query Match 



9.9%; Score 538; DB 2; Length 861; 



Best Local Similarity 37.3%; Pred. No, 1.56e-77; 



I ::M I : I :| I I I II I :|: h I ::IMI :H 



Matches 


Db 


26 


Qy 


183 


Db 


84 


Qy 


243 


Db 


144 


Qy 


298 


Db 


202 


Qy 


358 



I II : I I I hi I III 



: ||: :: I: |||:| ::| ||:| |: I I :|| I II 



I 1 1 1 : 1 1 1 1 1 I: III !:: ||: 



RESULT 9 

ENTRY A49175 ttype fragment 

title Motch B protein - mouse (fragment) 

ALT ERNAT E_NAMES Notch homolog 

ORGANISM tformaljiame Mus musculus tcommonjiame house mouse 

DATE 21-Jan-1994 tsequence_revision 05-Jan-1996 ttext_change 

14-Aug-1998 
ACCESSIONS A49175; PH1570; S32113 
REFERENCE A49175 

tauthors Lardelli, M,; Lendahl, U. 

tjournal Exp. Cell Res. (1993) 204:364-372 

ttitle Motch A and Motch B--two mouse Notch homologues coexpressed 

in a wide variety of tissues, 
tcross -references MUID: 93178563 
taccession A49175 

ttstatus preliminary; nucleic acid sequence not shown 

t»molecule_type mRNA 

ttresidues 1-1203 ftlabel LAR 

ttcross-references EMBL:X68279; NID:g287989; PID:g287990 

t»experimental_source embryo 

ttnote sequence extracted from NCBI backbone (NCBIP: 126158) 

COMMENT This protein has many EGF repeats and lin-12/Notch repeats. 
COMMENT This protein is one of the neurogenic proteins controlling the 

decision between ectodermaland neural fate for cells in the early 

embryo. 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

560-591 tdomain EGF homology tlabel EGF 

SUMMARY tlength 1203 tchecksum 910 



Query Match 



9.9%; Score 540; DB 2; Length 1203; 
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Best Local Similarity 38.8%; Pred. No. 6.30e-78; 

Conservative 50; Mismatches 57; Indels 19; 



I I : Ihl III |: l|::|: I ::|| I ::| I : 



Matches 


Db 


558 


Qy 


181 


Db 


616 


Qy 


241 


Db 


661 


Qy 


301 


Db 


719 


1 


361 



1:1 I: II II I: ""I" I III I 



--Y-VNSYTCTCPA 660 

: :||| I II 



: I III:::: II ::'l I I h |:||:|: Mil I :|::| : I 



1:1:1111: :| hi Ihl h 



RESULT 10 

ENTRY A35844 ttype complete 

TITLE Xotch protein • African clawed frog 

ORGANISM tformaljiame Xenopus laevis tcommonjiame African clawed frog 

DATE 12-Oct-1990 tsequencejrevision 12-Oct-1990 ftext change 

14-Aug-1998 
ACCESSIONS A35844 
REFERENCE A35844 

tauthors Coffman, C; Harris, W,; Kintner, C. 

tjournal Science (1990) 249:1438-1441 

♦title Xotch, the Xenopus homolog of Drosophila notch. 

tcross -references MUID:90385285 

faccession A35844 

ttstatus preliminary; nucleic acid sequence not shown; not 

compared with conceptual translation 
ttmolecule_type mRNA 
ttresldues 1-2524 ttlabel COF 
CLASSIFICATION tsuper family unassigned ankyrin repeat proteins; ankyrin 

repeat homology; EGF homology 
KEYWORDS transmembrane protein 

FEATURE 

222-254 f domain EGF homology tlabel EGF\ 

1924-1956 tdomain ankyrin repeat homology f label AN1\ 

1957-1989 f domain ankyrin repeat homology tlabel AN2\ 

• 1991-2023 tdomain ankyrin repeat homology ilabel AN3\ 

2024-2056 ffdomain ankyrin repeat homology ilabel AN4\ 

2057-2089 ffdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2524 (molecular-weight 274931 tchecksum 9441 



Query Match 



3.9%; Score 540; DB 2; Length 2524; 



Best Local Similarity 37.7%; Pred. No. 6.30e-78; 



I I I :|| I :h |: I ::||||::|| 



Matches 


Db 


757 


Qy 


183 


Db 


815 


Qy 


243 


Db 


862 


Qy 


303 


Db 


920 


Qy 


363 


Db 


970 


Qy 


423 



II : III : I I I llh : 



II: I::| I IMI 



QGQTCEIDMNECVNR-PCRNGATCQNTNGSYKCNCKPGYTGRNCEMDIDDCQPNPCHNG 919 
:| II :: I :: II I : I Mil I I INI III :llll hi 



III III : I I :h I II 



I :::! :| :|| II :|| 



RESULT 11 

ENTRY 150719 ttype complete 

TITLE C-Delta-1 - chicken 

ORGANISM tformaljiame Gallus gallus tcommon_name chicken 

DATE 13-Sep-1996 tsequence revision 13-Sep-1996 ttext change 

14-Aug-1998 
ACCESSIONS 150719 
REFERENCE 150719 

tauthors Henrique, D.; Adam, J.; Myat, A.; Chitnis, A.; Lewis, J.; 

Ish-Horowicz, D. 
tjournal Nature (1995) 375:787-790 

ftitle Expression of a Delta homologue in prospective neurons in the 
chick. 

tcross -references MUID: 95319507 
•accession 150719 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
ffmolecule_type mRNA 
••residues 1-728 ttlabel HEN 
t tcross -references EMBL:U26590; NID;g882411; PID:g882412 
CLASSIFICATION tsuperfamily EGF homology 
FEATURE 

454-485 tdomain EGF homology tlabel EGF 

SUMMARY , tlength 728 fmolecular -weight 79861 tchecksum 1765 

Query Match 9.84; Score 533; DB 2; Length 728; 

Best Local Similarity 38.7%; Pred. No. 1.50e-76; 

Conservative 40; Mismatches 68; Indels 17; Gaps 12; 



Mil I I I: lh l|: I |,| :|: I ;:|| I ::| 



:il l::!l: :|| I : |: : II I I : 1 1 1 1 :|; hll : I llh 



Matches 


Db 


302 


Qy 


185 


Db 


361 


Qy 


245 


Db 


407 


Qy 


305 


Db 


465 


Qy 


365 



Ml: I I I III I I I 



RESULT 12 

ENTRY 148324 ttype complete 

TITLE DELTA- like 1 - mouse 

ORGANISM tformaljiame Mus musculus tcommonjiame house mouse 

DATE 02-M-1996 tsequence_revision 02-M-1996 ttext change 

28-Feb-1997 
ACCESSIONS 148324 
REFERENCE 148324 

tauthors Bettenhausen, B.; de Angelis, M.H.; Simon, D. ; Guenet, J.L.; 

Gossler, A. 

tjournal Development (1995) 121:2407-2418 

•title Transient and restricted expression during mouse 

embryogenesis of Dill, a murine gene closely related to 
Drosophila Delta, 
tcross -references MUID: 95401858 
taccession 148324 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
•tmolecule.type mRNA 
ttresidues 1-722 ttlabel RES 
ttcross-references EMBL:X80903; NID:g806569; PID:g806570 
GENETICS 

tgene Dill 
SUMMARY tlength 722 tmolecular-weight 78448 tchecksum 1452 

Query Match 9.7%; Score 527; DB 2; Length 722; 

Best Local Similarity 35.6%; Pred. No. 2.27e-75; 

Matches 89; Conservative 54; Mismatches 80; Indels 27; Gaps 20; 
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Db 


294 


Oy 


185 


Db 


353 


oy 


245 


Db 


399 


Qy 


305 


Db 


457 


Qy 


365 


Db 


507 


Qy 


425 


1 


!jlt 


ENTRY 



11:1 I I : l|: I :|| ::| I III ||:| : : |:| 



:| I : I II: I 



:M I:: I :|| I :||: : || | | :||:| || |:||| : | |||: 



II:: : I I llhl I :|: 



I : I I 



TITLE 

ORGANISM 

DATE 

ACCESSIONS 



A35672 ftype complete 

crumbs protein - fruit fly (Drosophila raelanogaster) 
fformal_name Drosophila melanogaster 
21-Sep-1990 isequence_revision 18-Nov-1992 ttext change 

14-Aug-1998 
A35672 
A35672 

lauthors Tepass, 0.; Theres, C; Knust, E. 
♦journal Cell (1990) 61:787-799 

ttitle crumbs encodes an EGF-like protein expressed on apical 

membranes of Drosophila epithelial cells and required for 
organization of epithelia. 
tcross-references MUID: 90263104 
♦accession A35672 

♦♦status preliminary 
ffmolecule.type mRNA 
♦♦residues 1-2139 itlabel TEP 
♦♦cross-references GB:M33753 

♦♦note the authors translated the codon GGC for residue 1928 as 

Cys, and TAT for residue 2023 as Gin 

GENETICS 

tgene FlyBase:crb 

Hcross- references FlyBase : FBgn0000368 
CLASSIFICATION fsuperfamily EGF homology 



KEYWORDS 
FEATURE 
^691-722 

m 

Query Match 



Matches 


Db 


747 


Qy 


166 


Db 


805 


Qy 


221 


Db 


864 


Qy 


278 


Db 


922 


Qy 


330 


Db 


980 




> 



transmembrane protein 

tdomain EGF homology flabel EGF 
ilength 2139 tmolecular-weight 233619 ((checksum 7230 

9.7%; Score 529; DB 2; Length 2139; 



I: II I : I : : |:|: I: I :|| II 
- -CLNSPCKNNAICETTSSRKYTCNCT-PGFYGVHCENQID 220 



:|| II I : 



I I::|: I II :|::| :: |: |:|:: 



:| I I 1:1 : I I M ||:|| ::: | II ::]: 



1:1:1 I III I :|| III: 



II I :| M III I : 1:1 I : 



- LL - KGCDO - NPCLNGG AC - LP - YLINEVTHLYTCTCENGFQGDKCEKTTTLS -MVAT SL 1033 



:: : : ::| ; || : :: : ;|| | :|| | |:: :::::: 
Qy 389 PMMDMEYQKTDACQQS-ACGQGECVASQNSSDFTCKCHEGFSGPSCDRQMSVGFKNPGAY 447 

Db 1034 ISVTTEREEGYDINLQFRTTLPNGVLAF 1061 

:| I : :||| hi : 
Qy 448 LALDPLASDG • T ITMTLRTTSKIG ILLY 474 



RESULT 14 

ENTRY A49128 I type complete 

TITLE cell -fate determining gene Notch2 protein - rat 

ORGANISM tformaljiame Rattus norvegicus ♦commonjiame Norway rat 

DATE' 21-Jan-1994 tsequencejrevision 18-Nov-1994 Itext change 

14-Aug-1998 
ACCESSIONS A49128 
REFERENCE A49128 

lauthors Weinmaster, G.; Roberts, V.J.; Lemke, G. 
((journal Development (1992) 116:931-941 
ttitle Notch2: a second mammalian Notch gene, 
tcross-references MDID:93202015 
♦accession A49128 

♦♦status preliminary; not compared with conceptual translation 

*(fmolecule_type mRNA 

tiresidues 1-2471 Iflabel WEI 

♦fexperimentaljource Schwann cell 

tinote sequence extracted from NCBI backbone (NCBIP : 127811} 

CLASSIFICATION fsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

1029-1060 tdomain EGF homology ilabel EGF\ 

1876-1908 tdomain ankyrin repeat homology tlabel AN1\ 

1909-1941 tdomain ankyrin repeat homology Itlabel AN2\ 

1943-1975 tdomain ankyrin repeat homology ((label AN3\ 

1976-2008 tdomain ankyrin repeat homology tlabel AN4\ 

2009-2041 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2471 fmolecular-weight 265367 tchecksum 5929 

Query Match 9.7%; Score 527; DB 2; Length 2471; 

Best Local Similarity 39,3%; Pred. No. 2.27e-75; 



I I- H:ll:ll I : I hi III V l|::|: I ::|l I ::| I 



Matches 


Db 


875 


Qy 


181 


Db 


933 


Qy 


241 


Db 


978 


Qy 


301 


Db 


1036 


Qy 


361 



hi I II II h 



I III I 



-SD Y-VNSYTCTCPA 977 

I: : :lll I II 



I III:::: II ::l I I h hlhh III! I :|::l : 



hhlllh : I I Ihl |: 



RESULT 15 

ENTRY A31246 ftype complete 

TITLE neurogenic protein Delta precursor - fruit fly (Drosophila 

melanogaster) 

ORGANISM fformaljiame Drosophila melanogaster 

DATE 31-Mar-1990 tsequence_revision 31-Mar-1990 ttext change 

14-Aug-1998 
ACCESSIONS A31246 
REFERENCE A31246 

♦authors Kopczynski, C.C.; Alton, A.K.; Fechtel, K,; Kooh, P.J.; 

Muskavitch, M.A.T. 
♦journal Genes Dev. (1988) 2:1723-1735 
♦title Delta, a Drosophila neurogenic gene, is transcriptionally 

complex and encodes a protein related to blood coagulation 
factors and epidermal growth factor of vertebrates. 
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tcross -references MUID;89196890 
taccession A31246 
ttmolecule.type mRNA 
ttresidues 1-832 Mlabel KOP 
tScross-references GB: Y00222 
GENETICS 

tgene FlyBase:Dl 

tScross-references FlyBase:FBgn0000463 
CLASSIFICATION fsuperfamily EGF homology 
FEATURE 

457-488 ((domain EGF homology # label EGF 

SUMMARY f length 832 f molecular -weight 88943 tchecksum 636 

Query Match 9.6*; Score 520; DB 2; Length 832; 

Best Local Similarity 37.1*; Pred. No. 5.38e-74; 

Matches 95; Conservative 55; Mismatches 77; Indels 29; Gaps 22; 

•295 CTNHRPCKNGGTCFNTGEGLYTCKCAPGYSGDDCENEIYSCDADVNPCQNGGTCIDEPHT 354 
I I llll : I I: llhhll: I llhl :| : :|| I :|| ::: 
Qy 183 CLN-SPCKNNAICETTSSRRYTCNCTPGFYGVHCENQIDACYG-SPCLNNATC-KVAQA 238 

Db 355 KTGYKCHCRNGWSGKMCEEKVLTCSDRPCHQG-ICRN-VR--PG-LGS-KG-Q--GYQCE 405 

::| I :| I II :: I : I I I : II : I : : I :|:|: 
Qy 239 GR-FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCD 297 

Db 406 CP IGYSGPNCDLQLDNCSP - - NPCINGGSCQP ■ SGK - - C ICPSGFSGTRCETNIDDCLGH 460 

II: I I :h h h III II I I :| hh:||:| I ! 1 1 1 1 1 1 
Qy 298 CPMEYEGRHCEDKLEYCTKRLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCRNV 357 



Db 



Db 



461 QCENGGTCIDMVNQYRCQCVPGFHGTHC--SSRVDL-CL-IRPC--AN-G-GTCL-NLNN 511 

:|:IM:|:| : I I I l|: I I ::::|: :| : I I |: : |: 
358 ECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNS 417 



512 -DYQCTCRAGFTGKDC 526 
I: I h ll:| I 
Qy 418 SDFTCRCHEGFSGPSC 433 



Search completed: Fri May 28 09:11:33 1999 
Job time : 79 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^;rch_pp protein - protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:11:51 1999; MasPar time 21.65 Seconds 

959.467 Million cell updates/sec 

Tabular output not generated. 

Title: >US-09 -191-647 -9 

Description: (1-735) from US09191647 .pep 

Perfect Score: 5438 

Sequence: 1 SNKNLTSFPSRIPFDTTELY TVHIIRQCQCEPTKSVLSEK 735 

Scoring table: PAM 150 
Gap 11 



77977 seqs, 28268293 residues 

Post-processing: Minimum Match 0* 

Listing first 45 summaries 

Database: swiss-prot37 
l:swissprot 

Statistics: Mean 51.090; Variance 86.159; scale 0,593 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



No. 


Score 


Match Length 


DB ID 


Description p 


red. No. 


1 


1838 


33,8 


1480 


1 SLITJROME 


SLIT PROTEIN PRECURSOR 


0.00e+00 


2 


588 


10.8 


1064 


1 FBP1.STRPU 


FIBROPELLIN I PRECURSO 


4.02e 


104 


3 


559 


10.3 


570 


1 FBP3.STRPU 


FIBROPELLIN C PRECURSO 


2.70e 


97 


4 


557 


10.2 


2703 


1 NOTCJROME 


NEUROGENIC LOCUS NOTCH 


7.95e 


97 


5 


548 


10.1 


2437 


1 NOTCJRARE 


NEUROGENIC LOCUS NOTCH 


1.03e 


94 


6 


542 


10.0 


2524 


1 NOTCJENLA 


NEUROGENIC LOCUS NOTCH 


2.61e 


93 


7 


527 


9.7 


722 


1 DLLlJOUSE 


DELTA-LIKE PROTEIN 1 P 


8.40e 


90 


8 


529 


9.7 


723 


1 DLL1JUMAN 


DELTA-LIKE PROTEIN 1 P 


2.87e 


90 


9 


529 


9.7 


2139 


1 CRB_DROME 


CRUMBS PROTEIN PRECURS 


2.87e 


90 


10 


520 


9.6 


880 


1 DL.DROME 


NEUROGENIC LOCUS DELTA 


3.61e 


88 


11 


520 


9.6 


2531 


1 NTC1.RAT 


NEUROGENIC LOCUS NOTCH 


3.61e 


88 


12 


518 


9.5 


2531 


1 NTC1.MOUSE 


NEUROGENIC LOCUS NOTCH 


1.06e 


87 


13 


508 


9.3 


714 


1 DLLlJAT 


DELTA-LIKE PROTEIN 1 P 


2.25e 


85 


14 


505 


9.3 


2318 


1 NTC3JOUSE 


NEUROGENIC LOCUS NOTCH 


1.12e 


84 


15 


506 


9.3 


2444 


1 NTClJUMAN 


NEUROGENIC LOCUS NOTCH 


6.56e 


85 


16 


481 


8.8 


1964 


1 NTC4_MOUSE 


NEUROGENIC LOCUS NOTCH 


4.10e 


79 


17 


434 


8.0 


1429 


1 LI12.CAEEL 


LIN-12 PROTEIN PRECURS 


2.59e 


68 


18 


417 


7.7 


383 


1 DLKJOMAN 


DELTA-LIKE PROTEIN PRE 


1.93e 


64 


19 


412 


7.6 


1408 


1 SERRJROME 


SERRATE PROTEIN PRECUR 


2.63e 


63 


20 


401 


7.4 


385 


1 DLK_M0USE 


DELTA-LIKE PROTEIN PRE 


8.14e 


61 


21 


391 


7.2 


1295 


1 GLP1.CAEEL 


GLP-1 PROTEIN PRECURSO 


1.47e 


58 


22 


318 


5.8 


5147 


1 FAT_DR0ME 


CADHERIN-RELATED TUMOR 


2.24e 


42 


23 


304 


5.6 


3358 


1 PGCVJOUSE 


VERSICAN CORE PROTEIN 


2.44e 


39 



24 


307 5 


.6 3396 


PGCV_HUMAN 


VERSICAN CORE PROTEIN 


5.47e-40 


25 


300 5 


.5 862 


PGCV_MACNE 


VERSICAN CORE PROTEIN 


1.78e-38 


26 


292 5 


.4 3562 


PGCV CHICK 


VERSICAN CORE PROTEIN 


9.29e-37 


27 


285 5 


.2 886 


EMRljiUMAN 


CELL SURFACE GLYCOPROT 


2.91e-35 


28 


276 5 


.1 1257 


PGCN RAT 


NEUROCAN CORE PROTEIN 


2,38e _ 33 


29 


275 5 


.1 1394 


TGFB_HUMAN 


LATENT TRANSFORMING GR 


3.88e-33 


30 


278 5 


.1 1959 


AGRI RAT 


AGRIN PRECURSOR. 


3.97e-34 


31 


272 5 


.0 1245 


NIDOJ1QUSE 


NIDOGEN PRECURSOR (ENT 


l,67e-32 


32 


274 5 


.0 1268 


PGCN_MOUSE 


NEUROCAN CORE PROTEIN 


6 . 31e-33 


33 


267 4 


.9 1712 


TGFBJAT 


LATENT TRANSFORMING GR 


l,89e-31 


34 


269 4 


.9 2911 


FBN2.HUMAN 


FIBRILLIN 2 PRECURSOR. 


7.16e-32 


35 


259 4 


.8 816 


NEL_CHICK 


NEL PROTEIN PRECURSOR 


8.95e-30 


36 


263 4 


.8 2871 


FBNlJOVIN 


FIBRILLIN 1 PRECURSOR 


1.30e-30 


37 


262 4 


.8 2871 


FBNlJOUSE 


FIBRILLIN 1 PRECURSOR. 


2.11e-30 


38 


261 4 


.8 2871 


FBN1JUMAN 


FIBRILLIN 1 PRECURSOR. 


3.42e-30 


39 


254 4 


.7 816 


NELJUMAN 


NEL PROTEIN PRECURSOR 


9.85e-29 


40 


255 4 


.7 2907 


FBN2JOUSE 


FIBRILLIN 2 PRECURSOR. 


6.10e-29 


41 


243 4 


.5 816 


NELJOUSE 


NEL PROTEIN PRECURSOR 


1.85e-26 


42 


246 4 


.5 1955 


AGRI.CHICK 


AGRIN PRECURSOR. 


4.47e-27 


43 


239 4 


.4 816 


NELJAT 


NEL PROTEIN PRECURSOR 


l,23e-25 


44 


240 4 


.4 1247 


NIDO.HUMAN 


NIDOGEN PRECURSOR (ENT 


7.65e-26 


45 


234 4 


.3 1376 


NID2 HUMAN 


NIDOGEN- 2 PRECURSOR (N 


l,29e-24 



RESULT 1 

ID SLITJROME STANDARD; PRT; 1480 AA, 

AC P24014; 

DT 01-MAR-1992 (REL. 21, CREATED) 

DT 01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE SLIT PROTEIN PRECURSOR. 

GN SLI. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 91099665. 

RA ROTHBERG J.M., JACOBS J.R., GOODMAN C.S., ARTAVANIS -TSAKONAS S.; 

RT "Slit: an extracellular protein necessary for development of midline 

RT glia and commissural axon pathways contains both EGF and LRR 

RT domains . ' ; 

RL GENES DEV. 4:2169-2187(1990). 

CC -!- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

CC COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

CC MATRIX MOLECULES. 

CC -I- TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 
CC EVENTUALLY DISTRIBUTED ALONG THE AXONS. 

CC •!■ ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

CC BY 11 AA AT THE C -TERMINUS OF THE LAST EGF REPEAT. 

CC -!- SIMILARITY; CONTAINS 7 EGF -LIKE DOMAINS. 

CC "I" SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS, NUMBER IN THIS PROTEIN: 22. TWO BLOCK OF 6 LRR'S 

CC AND TWO BLOCKS OF 5 LRR'S. 

CC -I- SIMILARITY: CONTAINS A C-TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http : //www . isb - s ib . c h/announce/ 

CC or send an email to license@isb-sib.ch) . 

cc 

DR EMBL; X53959; G8615; -. 

DR PIR; A36665; A36665. 

DR FLYBASE; FBgn0003425; Sli. 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF_1; 7. 

DR PROSITE; PS01185; CTCKJ; 1, 
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DR 


PROSITE; PS01186; EGF.2; 5. 




FT 


DISULFID 


973 982 BY SIMILARITY. 


DR 


PROSITE; PS01187; EGF_CA; 2, 




FT 


DISULFID 


989 1001 BY SIMILARITY, 


DR 


PROSITE; PS01225; CTCKJ; 1. 




FT 


DISULFID 


995 1010 BY SIMILARITY. 


DR 


PFAH; PF00007; Cysjnot; 1. 




FT 


DISULFID 


1012 1021 BY SIMILARITY. 


DR 


PFAM; PF00008; EGF; 7. 




FT 


DISULFID 


1028 1041 BY SIMILARITY. 


DR 


PFAM; PF00054; laminin.G; 1. 




FT 


DISULFID 


1035 1050 BY SIMILARITY, 


DR 


PFAM; PF00560; LRR; 10. 




FT 


DISULFID 


1052 1061 BY SIMILARITY. 


DR 


HSSP; P0O74O; 1IXA. 




FT 


DISULFID 


1068 1079 BY SIMILARITY. 


RW 


NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 


FT 


DISULFID 


1073 1088 BY SIMILARITY. 


KW 


EGF-LIKE DOMAIN; 


REPEAT; LEUCINE-REPEAT; DUPLICATION, 


FT 


DISULFID 


1090 1099 BY SIMILARITY. 


FT 


SIGNAL 


1 


36 




FT 


DISULFID 


1115 1125 BY SIMILARITY. 


FT 


CHAIN 


37 


1480 


SLIT PROTEIN. 


FT 


DISULFID 


1120 1137 BY SIMILARITY, 


FT 


DOMAIN 


70 


104 


CONSERVED N-FLANKING REGION OF THE LRR. 


FT 


DISULFID 


1139 1148 BY SIMILARITY. 


FT 


DOMAIN 


105 


230 


LEUCINE-RICH REPEATS (1ST REGION). 


FT 


DISULFID 


1357 1368 BY SIMILARITY. 


FT 


DOMAIN 


231 


294 


CONSERVED C-FLANKING REGION OF THE LRR. 


FT 


DISULFID 


1362 1380 BY SIMILARITY. 


FT 


DOMAIN 


295 


326 


CONSERVED N-FLANKING REGION OF THE LRR. 


FT 


DISULFID 


1382 1391 BY SIMILARITY. 


FT 


DOMAIN 


327 


452 


LEUCINE-RICH REPEATS (2ND REGION). 


FT 


DISULFID 


1409 1443 BY SIMILARITY. 


FT 


DOMAIN 


453 


518 


CONSERVED C'FLANKING REGION OF THE LRR, 


FT 


DISULFID 


1423 1457 BY SIMILARITY . 


FT 


DOMAIN 


519 


550 




FT 


DISULFID 


1434 1473 BY SIMILARITY. 


FT 




551 


653 


T.FnrTNF-RTfH RFDFAT" HRI) RFGTf)N\ 


FT 


DISULFID 


1438 1475 BY SIMILARITY, 


FT 


DOMAIN 


654 


714 


CONSERVED C'FLANKING REGION OF THE LRR, 


FT 


DISULFID 


1442 1479 BY SIMILARITY. 


it 




715 


746 


PflNSFRVPn N-PTAMKTMr BPfTfiN DP THP TBR 


so 


SEQUENCE 


1480 AA; 165752 MW; 2CD1C421 CRC32; 


■ 


DOMAIN 


747 


848 


LEUCINE-RICH REPEATS (4TH REGION). 








w 




849 


910 


fYlMCFDVPn ^-PT IVHfTU^ DPfTrtM ftP TUP TOD 


Query Match 


33.84; Score 1838; DB 1; Length 1480; 


FT 


REPEAT 


105 


115 




Bf 


st Local Similarity 41.81; Pred. No, 0.00e+00; 


FT 


REPEAT 


116 


139 


LRR 1-2* 


Matches 257 


Conservative 143; Mismatches 194; Indels 21; Gaps 15; 


FT 


REPEAT 


140 


163 


LRR 1-3* 






FT 


KIirLnl 


164 


187 


TPS 


Db 


729 SRNQLKEIPRGIPAETSELYLESNEIEQIHYERIRHLRSLTRLDLSNNQITILSNYTFAN 788 


FT 


PPDPUT 


188 


211 






1 : 1 


M II MMMI::! MM : M IIIMIIIM:: 1 1 MM 


FT 


RPDP1T 


212 




tpr 


oy 


1 SNKNLTSFPSRIPFDTTELYLDAN7INEIPAHDLNRLYSLTKLDLSHNRLISLENNTFSN 60 


FT 


DPDPST 
KbrtAl 


«I 




TDD 1 1 

LRR 2-1, 








FT 


lUirftAl 








Db 


789 LTKLSTLIISYNKLQCLQRHALSGLNNLRWSLHGNRISMLPEGSFEDLKSLTHIALGSN 848 


FT 


dpdpst 




w> 


top 0-1 

r!!n n < 




MINIMUM:!!! 1 : : 1 1 1 MMMI II ll:::| :| IMIIIMII 


FT 


REPEAT 


386 


inn 

409 


LRR 2-4. 


Qy 


61 LTRLSTLIISYNKLRCLQPLAFNGLNALRILSLHGNDISFLPQSAFSNLTSITHIAVGSN 120 


FT 


DPDPST 


410 


433 


t'nn \ \' 








FT 


dfdp?it 
KtFLAI 


434 


452 


LRR 2"6, 


Db 


84 9 PLYCDCGLKWFSDWIKLDYVEPGIARCAEPEQMKDKLILSTPSSSFVCRGRVRNDILAKC 908 


FT 




551 


562 


!*!!« 




Mill 


: III III : : 1 : 1 1 1 1 I: : : IM:: : 1 1 ;:| : Ml 


FT 


REPEAT 


563 


586 


LRR 3*2. 


Qy 


121 SLYCDCNMAWFSKWIKSKFIEAGIARCEyPNTVSNQLLLTAQPYQFTCDSKVPTKLATKC 180 


FT 


Klif LAI 


587 


610 


LRR 3*3, 








FT 


KtPbAI 


611 


634 


LRR 3-4 . 


Db 


909 NACFEQPCQNQAQCVALPQREYQCLCQPGYHGKHCEFMIDACYGNPCRNNATCTVLEEGR 968 


FT 


KtrtAI 


635 


653 


LRR 3-5. 




: 1:: 


II 1 1 1 : : 1 1 1 1 II: 1 III IIIIIIMI Mill 1 : II 


FT 


REPEAT 


747 


757 


LRR 4-1. 


Qy. 


181 DLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 240 


FT 


REPEAT 


758 


781 


LRR 4*2, 








FT 


DPDPRT 

KfcrhAl 






LRR 4-3. 


Db 


969 FSCQCAPGYTGARCETNIDDCL-G* -EI- -KCQNNAT-C- - 1 -D--G- -VESYKCECQP 1014 


FT 


KhPhAI 


806 


829 


LRR 4-4, 




Ml 1 


1: 1 II Mill: : 1 II : 1 : : : : : 1 1 : 1 : 1 


FT 


REPEAT 


830 


848 


LRR 4-5. 


Qy 


241 FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPM 300 


FT 


PtflVfiTW 

DVMA1N 


907 


lit 










FT 


DOMAIN 


946 


983 


EGF-LIKE 2, 


Db 


1015 GFSGEFCDTKIQFCSPEFNPCANGAKCMDHFTHYSCDCQAGFHGTNCTDNIDDCQNHMCQ 1074 


FT 


DOMAIN 


985 


1022 


EGF-LIKE 3/ CALCIUM-BINDING (POTENTIAL). 




: 1 


1: |::M: MM 1 MM III 1 Ml 1 II Mill 1 II 




DOMAIN 


1024 


1062 


EGF-LIKE 4. 


Qy 


301 EYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQ 3 6 0 


■ 


DOMAIN 


1064 


1100 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 








w 


DOMAIN 


1111 


1149 


EGF-LIKE 6, 


Db 


1075 NGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMYPQTSPCQNHECKHGVCFQPNAQGSD 1134 


FT 


DOMAIN 


1353 


1392 


EGF-LIKE 7, 




llhlllll Ml 1 IM III 1: 1 1 1 Ml 1 Ml :: Ml 


FT 


DOMAIN 


1409 






Qy 


361 NGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECV-ASQNSSD 419 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM), 








FT 


CARBOHYD 


111 


111 


POTENTIAL, 


Db 


1135 YLCRCHPGYTGKKCEYLTSISFVHNNSFVELEPLRTRPEANVTIVFSSAEQNGILMYDGQ 1194 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 




: 1:11 MM 1.: MM : ::: IMI : ::: M : : :: MM |: 


FT 


CARBOHYD 


357 


357 


POTENTIAL, 


Qy 


420 FTCKCHEGFSGPSCDRQMSVGFKNPGAYLALDPL-A-SDGTITMTLRTTSKIGILLYYGD 477 


FT 


CARBOHYD 


435 


435 


POTENTIAL, 






FT 


CARBOHYD 


783 


783 


POTENTIAL. 


Db 


1195 DAHLAVELFNGRIRVSYDVGNHPVSTMYSFEMVADGKYHAVEL-LAIKKNFTLRVDRGLA 1253 


FT 


CARBOHYD 


•788 


788 


POTENTIAL. 






M:MI::: 1 Ml 1 1 III MM 1 : : : M 1 MM: 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


Qy 


478 DHFVSAELYDGRVKLVYYIGNFPASHMYSSVKVNDGLPHRISIRTSERKCF-LQIDKNPV 536 


FT 


CARBOHYD 


998 


998 


POTENTIAL. 






FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


Db 


1254 RSIINEGSNDYLKLT-TPM-FLGGLPVDPAQQAYKNWQIRNLTSFKGCMKEVWINHKLVD 1311 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 




: : 1 


IMI 1 "Nil:: MM : "M IMIM : II :: 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL, 


Qy 


537 QIVENSGKSDQLITKGKEMLYIGGLPIEKSQDAKRRFHVKNSESLKGCISSITINEVPIN 596 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL, 








FT 


CARBOHYD 


1292 


1292 


POTENTIAL, 


Db 


1312 FGNAQRQQKITPGCA 1326 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 




: 1 


: M: 


FT 


DISULFID 


916 


932 


BY SIMILARITY. 


Qy 


597 LQQALENVNTEQSCS 611 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 






FT 


DISULFID 


950 


961 


BY SIMILARITY, 








FT 


DISULFID 


955 


971 


BY SIMILARITY. 


RESULT 2 
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ID FBPl.STRPU STANDARD; PRT; 1064 AA. 

AC P10079; 

DT 01-MAR-1989 (REL. 10, CREATED) 

DT 01-FEB-1996 (REL, 33, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE FIBROPELLIH I PRECURSOR (EPIDERMAL GROWTH FACTOR -RELATED PROTEIN 1) 

DE (UEGF-1). 

GN EGFL 

OS STRONGYLOCENTROTOS PURPURATUS (PURPLE SEA URCHIN). 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 

OC EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROT I DAE ; 

OC STRONGYLOCENTROTOS . 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 90112459. 

RA DELGADILLO-REYNOSO M.6., ROLLO D.R., HURSH D.A., RAFF R.A.; 

•"Structural analysis of the uEGF gene in the sea urchin 
strongylocentrotus purpuratus reveals more similarity to vertebrate 
than to invertebrate genes with EGF-like repeats."; 

RL J. MOL. EVOL. 29:314-327(1989). 

RN [2] 

RP SEQUENCE OF 279-476 AND 781-1064 FROM N.A. 

RX MEDLINE; 87319677. 

RA HURSH D.A., ANDREWS M.E., RAFF R.A.; 

RT "A sea urchin gene encodes a polypeptide homologous to epidermal 

RT growth factor,"; 

RL SCIENCE 237:1487-1490(1987). 

RN [3] 

RP AVIDIN-LIKE DOMAIN. 

RX . MEDLINE; 89196806. 

RA HUNT L.T., BARKER W.C.; 

RT "Avidin-like domain in an epidermal growth factor homolog from a sea 

RT urchin."; 

RL FASEB J. 3:1760-1764(1989). 

RN [4] 

RP CHARACTERIZATION. 

RX MEDLINE; 91285254. 

RA BISGROVE B.W., ANDREWS M.E., RAFF R.A.; 

RT "Fibropellins, products of an EGF repeat-containing gene, form a 

RT unique extracellular matrix structure that surrounds the sea urchin 

RT embryo."; 

RL DEV. BIOL. 146:89-99(1991). 

CC -!- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
CC MATRIX. 

•-!- SUBCELLULAR LOCATION: EXTRACELLULAR. IN VESICLES IN THE CYTOPLASM 
OF UNFERTILIZED EGGS, THEN TO THE BASE OF THE HYALIN LAYER 
THROUGHOUT DEVELOPMENT AND FINALLY IN THE APICAL LAMINA IN LATE 
CC EMBRYOS AND EARLY LARVAE. 

CC -!■ DEVELOPMENTAL STAGE: MODERATE LEVELS IN UNFERTILIZED EGGS AND 
CC DURING EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN 
CC LATE MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS 
CC MAINTAINED THROUGH SUBSEQUENT STAGES. EXPRESSED BOTH MATERNALLY 
CC AND ZYGOTICALLY. 

CC -!- ALTERNATIVE PRODUCTS: TWO FORMS (IA AND IB) ARE PRODUCED BY 
CC ALTERNATIVE SPLICING. THE SMALL FORM (IB) LACKS 8 EGF REPEATS. 

CC -!- SIMILARITY: CONTAINS 21 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 1 CUB DOMAIN. 

CC -!- SIMILARITY: THE C-TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
CC TO AVIDIN/STREPTAVIDIN, 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenselisb-sih.ch). 

cc 

DR EMBL; L08692; G161467; -. 

DR EMBL; L08692; G161466; *. 

DR EMBL; X17530; G667061; -. 

DR EMBL; M17421; G552260; -. 

DR EMBL; X17533; G667062; -. 



DR PIR; A29316; A29316. 

DR PROSITE; PS00010; ASXJYDROXYL; 19. 

DR PROSITE; PS00022; EGF J; 19. 

DR PROSITE; PS00577; AVIDIN; 1, 

DR PROSITE; PS01180; CUB; 1. 

DR PROSITE; PS01186; EGF_2; 19, 

DR PROSITE; PS01187; EGF CA; 19. 

DR PFAM; PF00008; EGF; 21. 

DR PFAM; PF00431; CUB; 1. 

DR HSSP; P01132; 1EPH. 

KW BIOTIN; ALTERNATIVE SPLICING; EGF-LIKE DOMAIN; REPEAT; SIGNAL; 

KW GLYCOPROTEIN. 



FT 


SIGNAL 


1 


19 


POTENTIAL, 


FT 


CHAIN 


20 


1064 


FIBROPELLIN I. 


FT 


DOMAIN 


20 


55 


EGF-LIKE 1. 


FT 


DOMAIN 


62 


175 


CUB. 


FT 


DOMAIN 


176 


212 




FT 


DOMAIN 


214 


250 




FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


366 


402 




FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


442 


478 






DOMAIN 


480 


516 


EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


518 


554 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


556 


592 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


594 


630 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


632 


668 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


670 


706 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


708 


744 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


746' 


782 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


784 


820 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


822 


858 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


860 


896 


EGF-LIKE 20. 


FT 


DOMAIN 


898 


934 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


936 


1064 


AVIDIN-LIKE. 


FT 


DISULFID 


23 


34 


BY SIMILARITY. 


FT 


DISULFID 


28 


43 


BY SIMILARITY. 


FT 


DISULFID 


45 


54 


BY SIMILARITY, 


FT 


DISULFID 


180 


191 


BY SIMILARITY, 


FT 


DISULFID 


185 


200 


BY SIMILARITY. 


FT 


DISULFID 


202 


211 


BY SIMILARITY. 


FT 


DISULFID 


218 


229 


BY SIMILARITY. 


FT 


DISULFID 


223 


238 


BY SIMILARITY. 


FT 


DISULFID 


240 


249 


BY SIMILARITY. 


FT 


DISULFID 


256 


267 


BY SIMILARITY. 


FT 


DISULFID 


261 


276 


BY SIMILARITY. 


FT 


DISULFID 


278 


287 


BY SIMILARITY. 


FT 


DISULFID 


294 


305 


BY SIMILARITY. 


FT 


DISULFID 


299 


314 


BY SIMILARITY . 


FT 


DISULFID 


316 


325 


BY SIMILARITY. 


FT 


DISULFID 


332 


343 


BY SIMILARITY. 


FT 


DISULFID 


337 


352 


BY SIMILARITY. 


FT 


DISULFID 


354 


363 


BY SIMILARITY. 


FT 


DISULFID 


370 


381 


BY SIMILARITY. 


FT 


DISULFID 


375 


390 


BY SIMILARITY. 


FT 


DISULFID 


392 


401 


BY SIMILARITY. 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


FT 


DISULFID 


430 


439 


BY SIMILARITY. 


FT 


DISULFID 


446 


457 


BY SIMILARITY. 


FT 


DISULFID 


451 


466 


BY SIMILARITY. 


FT 


DISULFID 


468 


477 


BY SIMILARITY. 


FT 


DISULFID 


484 


495 


BY SIMILARITY, 


FT 


DISULFID 


489 


504 


BY SIMILARITY, 


FT 


DISULFID 


506 


515 


BY SIMILARITY. 


FT, 


DISULFID 


522 


533 


BY SIMILARITY. 


FT 


DISULFID 


527 


542 


BY SIMILARITY. 


FT 


DISULFID 


544 


553 


BY SIMILARITY. 


FT 


DISULFID 


560 


571 


BY SIMILARITY. 


FT 


DISULFID 


565 


580 


BY SIMILARITY. 


FT 


DISULFID 


582 


591 


BY SIMILARITY. 
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598 


609 


BY SIMILARITY. 


RP 


603 


618 


BY SIMILARITY, 


RC 


620 


629 


BY SIMILARITY. 




636 


647 


BY SIMILARITY. 


RA 


641 


656 


BY SIMILARITY. 


RT 


658 


667 


BY SIMILARITY. 




674 


685 


BY SIMILARITY, 


BT 


679 


694 


BY SIMILARITY, 




696 


705 


BY SIMILARITY. 




712 


723 


BY SIMILARITY, 


rr 


717 


732 


BY SIMILARITY. 


rc 


734 


743 


BY SIMILARITY. 


CC 


750 


761 


BY SIMILARITY. 


CC 


755 


770 


BY SIMILARITY. 


CC 


772 


781 


BY SIMILARITY. 


CC 


788 


799 


BY SIMILARITY. 


CC 


793 


808 


BY SIMILARITY, 


CC 


810 


819 


BY SIMILARITY. 


CC 


826 


837 


BY SIMILARITY. 


CC 


831 


846 


BY SIMILARITY. 


CC 


848 


857 


BY SIMILARITY. 


CC 


864 


875 


BY SIMILARITY. 


CC 


869 


884 


BY SIMILARITY. 


CC 


886 


895 


BY SIMILARITY. 


CC 


902 


913 


BY SIMILARITY. 


CC 


907 


922 


BY SIMILARITY. 


CC 


924 


933 


BY SIMILARITY. 


CC 


477 


780 


MISSING (IN FORM IB). 


CC 


30 


30 


POTENTIAL. 


CC 


136 


136 


POTENTIAL. 


DR 


851 


851 


POTENTIAL. 


DR 


279 


279 


L -> S (IN REF, 2). 


DR 



FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

§ DISULFID 
DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID' 

FT VARSPLIC 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CONFLICT 

SO SEQUENCE 1064 AA; 112072 MW; FBD10D48 CRC32; 

Query Match 10.8*; Score 588; Ml; Length 1064; 

Best Local Similarity 39.7%; Pred. No. 4.02e-104; 

Matches 102; Conservative 47; Mismatches 79; Indels 29; Gaps 17 

Db 216 DECASDPCQNGGAC ■ VDGVNGYVCNCVPGFDGDECENNINECASSPCLNGGIC ■ VDGVNM 273 

I I : II I : I : I III III I III h I :||||| : I I 
Qy 181 DLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 240 

Db 274 FECTCLAGFTGVRCEVNIDECASAPCQNGG I-C— I-D-6-INGYTCSCPL 318 

1:1 I II I II 111:1 Ml : I : : : ||:| | ||: 
Qy 241 FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPM 300 

Db 319 GFSGDNCENNDDECSS- I - PCLNGGTCVDLVNAYMCVCAPGWTGPTCADNIDECASAPCQ 376 

: I Ml:: : |: : || | | |: : :| :|:|| || | ||| ; | : [ 
Qy 301 EYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDKKNVECQ 360 

• 377 NGGVC IDGVNG YMCDCQPG YTGTHCETD - - ID- EC ARP - PCQ - N - -G • GDCVDGVNG - - Y 426 
III Ml: :| | I : I I : I || :| | : :|| : | |:|| ; ; : 
Qy 361 NGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDF 420 

Db 427 VCICAPGFDGLNCENNI 443 

II II I :|: : 
Qy 421 TCKCHEGFSGPSCDRQM 437 



RESULT 3 

ID FBP3.STRPU STANDARD; PRT; 570 AA, 

AC P49013; 

DT 01-FEB-1996 (REL. 33, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN C PRECURSOR (EPIDERMAL GROWTH FACTOR -RELATED PROTEIN 3) 

DE (EGF III) (FIBROPELLIN III). 

GN EGF3. 

OS STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN), 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 

OC EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROTIDAE; 

OC STRONGYLOCENTROTUS. 

RN [1] 



SEQUENCE FROM N.A. 
TISSUE°GASTRULA; 
MEDLINE; 93273088. 
BISGROVE B.W., RAFF R.A.; 

"The SpEGF III gene encodes a member of the f ibropellins : EGF repeat- 
containing proteins that form the apical lamina of the sea urchin 
embryo."; 

DEV. BIOL, 157:526-538(1993). 

■!■ FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
MATRIX. 

-!• SUBCELLULAR LOCATION: EXTRACELLULAR. 

-I- DEVELOPMENTAL STAGE: LOW LEVELS IN UNFERTILIZED EGGS AND DURING 
EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN LATE 
MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS MAINTAINED 
THROUGH SUBSEQUENT STAGES. 

•!• EXPRESSED BOTH MATERNALLY AND ZYGOTICALLY. 

-I- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 

-!- SIMILARITY; CONTAINS 1 CUB DOMAIN. 

-!- SIMILARITY: THE C -TERMINAL DOMAIN OF THIS PROTEIN IS' SIMILAR 
TO AVIDIN/STREPTAVIDIN. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseGisb-sib.ch), 

EMBL; L07045; G310660; -. 
PROSITE; PS00010; ASXJYDROXYL; 8. 
PROSITE; PS00022; EGPJL; 8, 
PROSITE; PS00577; AVIDIN; 1, 
PROSITE; PS01180; CUB; 1. 
PROSITE; PS01186; EGF J; 7, 
PROSITE; PS01187; EGF CA; 6. 
PFAM; PF00008; EGF; 8, 
PFAM; PF00431; CUB; 1. 
HSSP; P00740; 1IXA. 

BIOTIN; EGF-LIKE DOMAIN; REPEAT; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


570 


FIBROPELLIN C. 


FT 


DOMAIN 


18 


55 


EGF-LIKE 1. 


FT 


DOMAIN 


62 


175 


CUB. 


FT 


DOMAIN 


176 


212 


EG 


'-LIKE 2, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7. 


FT 


DOMAIN 


404 


440 


EG 


HIKE 8, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


442 


570 


AVIDIN- LIKE, 


FT 


DISULFID 


23 


34 


BY 


SIMILARITY. 


FT 


DISULFID 


28 


43 


BY 


SIMILARITY, 


FT 


DISULFID 


45 


54 


BY 


SIMILARITY, 


FT 


DISULFID 


180 


191 


BY 


SIMILARITY. 


FT 


DISULFID 


185 


200 


BY 


SIMILARITY. 


FT 


DISULFID 


202 


211 


BY 


SIMILARITY. 


FT 


DISULFID 


218 


229 


BY 


SIMILARITY. 


FT 


DISULFID 


223 


238 


BY 


SIMILARITY. 


FT 


DISULFID 


240 


249 


BY 


SIMILARITY, 


FT 


DISULFID 


256 


267 


BY 


SIMILARITY. 


FT 


DISULFID 


261 


276 


BY 


SIMILARITY. 


FT 


DISULFID 


278 


287 


BY 


SIMILARITY. 


FT 


DISULFID 


294 


305 


BY 


SIMILARITY. 


FT 


DISULFID 


299 


314 


BY 


SIMILARITY. 


FT 


DISULFID 


316 


325 


BY 


SIMILARITY, 


FT 


DISULFID 


332 


343 


BY 


SIMILARITY, 


FT 


DISULFID 


337 


352 


BY 


SIMILARITY, 


FT 


DISULFID 


354 


363 


BY 


SIMILARITY. 


FT 


DISULFID 


370' 


381 


BY 


SIMILARITY. 


FT 


DISULFID 


375 


390 


BY 


SIMILARITY. 


FT 


DISULFID. 


392 


401 


BY 


SIMILARITY. 
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FT 


DISULFID 


408 


419 


BY SIMILARITY. 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


FT 


DISULFID 


430 


439 


BY SIMILARITY. 


FT 


CARBOHYD 


30 


30 


POTENTIAL. 


FT 


CARBOHYD 


136 


136 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


so 


SEQUENCE 


570 AA; 


61116 MW; 265BC4BB CRC32; 


Query Match 




10.3%; 


Score 559; DB 1; 



Best Local Similarity 42.2*; Pred. No, 2.70e-97; 
Matches 86; Conservative 35; Mismatches 64; Indels 19; Gaps 



Db 256 CASIPCLNGGIC-VDGINQFACTCLPGYTGILCETDINECASSPCQNGGSCTDA-VNRYT 313 

1:11':! : ::| I ||: |: || :|: | :||| | ::| | |: 
Qy 183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFN 242 

• 314 CDCRAGFTGSNCETNI NECASSPCLNGGSC • ■ L D G--VDGYVCQCLPNY 358 
I I II I II ll::| :| I III I I : : :::| |:| :| 
243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEY 302 

Db 359 TGTHCEISLDACAS-L-PCQNGGVCTNVGGDYVCECLPGYTGINCEIDINECASLPCQNG 416 

I III I: I: I Ihl II : I I I I 11:11 III :|::| :: III! 
Qy 303 EGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNG 362 

Db 417 GECINGIAMYICOCRQGYAGVNCE 440 

I h:|l I I II Nil II 
Qy 363 GSCVDGILSYDCLCRPGYAGQYCE 386 



RESULT 4 
ID 
AC 
DT 
DT 
DT 



2703 AA. 



NOTC_DROME STANDARD; 
P07207; P04154; 

01-NOV-1986 (REL. 03, CREATED) 
01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
NEUROGENIC LOCUS NOTCH PROTEIN PRECURSOR. 
N. 

DROSOPHILA MELANOG ASTER (FRUIT FLY). 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 86079539. 

WHARTON K.A., JOHANSEN K.M., XU T., ARTAVANIS ■ TS AKONAS S.; 

"Nucleotide sequence from the neurogenic locus notch implies a gene 
product that shares homology with proteins containing EGF-like 
repeats . " ; 

CELL 43:567-581(1985). 
[2] 

SEQUENCE FROM N.A. 
STRAINOREGON-R; 
MEDLINE; 87064624. 
KIDD S., KELLEY M.R., YOUNG M.W.; 

"Sequence of the notch locus of Drosophila melanogaster: relationship 
of the encoded protein to mammalian clotting and growth factors."; 
MOL. CELL. BIOL. 6:3094-3108(1986). 
[3] 

SEQUENCE OF 2505-2611 FROM N.A. 
MEDLINE; 85099329. 

WHARTON K.A., YEDVOBNICK B., FINNERTY V.G., ARTAVANIS-TSAKONAS S.; 
"opa: a novel family of transcribed repeats shared by the Notch locus 
and other developmental^ regulated loci in D. melanogaster,"; 
CELL 40:55-62(1985). 
[4] 

SEQUENCE OF 1-8 FROM N.A. 
MEDLINE; 87257846. 
RA KELLEY M.R., KIDD S., BERG R.L, , YOUNG M.W.; 

RT "Restriction of P-element insertions at the Notch locus of Drosophila 

RT melanogaster."; 

RL MOL. CELL. BIOL. 7:1545-1548(1987). 

RN [5] 

RP' REVIEW. 



RX 



RX 



RA HARRIS W. A.; 

RT "Many cell types specified by Notch function."; 

RL CURR, BIOL, 1:120-122(1991). 

CC -I- FUNCTION: NOTCH PROTEIN IS ESSENTIAL FOR PROPER DIFFERENTIATION OF 
CC ECTODERM. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -I- SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 

CC OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 

CC THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC -I- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC ■!■ SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; M16152; G157988; -. 

DR EMBL; M16153; G157988; JOINED. 

DR EMBL; M16149; G157988; JOINED, 

DR EMBL; M16150; G157988; JOINED. 

DR EMBL; M16151; G157988; JOINED, 

DR EMBL; K03508; G157993; -. 

DR EMBL; M13689; G157993; JOINED. 

DR EMBL; K03507; G157993; JOINED. 

DR EMBL; M12175; G950317; -. 

DR EMBL; M16025; G157995; -. 

DR PIR; A24420; A24420, 

DR PIR; A24768; A24768. 

DR PIR; A05267; A05267. 

DR FLYBASE; FBgn0004647; N. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS00022; EGF_1; 34, 

DR PROSITE; PS01186; EGF.2; 28. 

DR PROSITE; PS01187; EGF.CA; 22. 

DR PFAM; PF00008; EGF; 36. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


44 


POTENTIAL. 


FT 


CHAIN 


45 


2703 


NEUROGENIC LOCUS NOTCH PROTEIN. 


FT 


DOMAIN 


45 


1745 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


1746 


1766 


POTENTIAL, 


FT 


DOMAIN 


1767 


2703 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


58 


1451 


36 X EGF-TYPE REPEATS. 


FT 


DOMAIN 


58 


95 


EGF-LIKE 1. 


FT 


DOMAIN 


96 


136 


EGF-LIKE 2, 


FT 


DOMAIN 


139 


176 


EGF-LIKE 3. 


FT 


DOMAIN 


177 


215 


EGF-LIKE 4. 


FT 


DOMAIN 


217 


253 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


255 


291 


EGF-LIKE 6. 


FT 


DOMAIN 


293 


329 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


331 


370 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


372 


408 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


409 


447 


EGF-LIKE 10. 


FT 


DOMAIN 


449 


486 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


488 


524 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


526 


562 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


564 


600 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


602 


637 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


639 


675 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


677 


713 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


715 


751 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


753 


789 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


791 


827 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


829 


865 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) . 
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FT 


DOMAIN 


867 


905 


EGF-LIKE 22, 


FT 


DISULFID 724 739 BY SIMILARITY, 


FT 


DOMAIN 


907 


944 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 741 750 BY SIMILARITY. 


FT 


DOMAIN 


946 


982 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 757 768 BY SIMILARITY. 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 25. 


FT 


DISULFID 762 777 BY SIMILARITY, 


FT 


DOMAIN 


1022 


1058 


EGF-LIKE 26, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 779 788 BY SIMILARITY. 


FT 


DOMAIN 


1060 


1096 


EGF-LIKE 27. 


FT 


DISULFID 795 806 BY SIMILARITY, 


FT 


DOMAIN 


1098 


1134 


EGF-LIKE 28. 


FT 


DISULFID 800 815 BY SIMILARITY. 


FT 


DOMAIN 


1136 


1181 


EGF-LIKE 29, 


FT 


DISULFID 817 826 BY SIMILARITY. 


FT 


DOMAIN 


1183 


1219 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 833 844 BY SIMILARITY. 


FT 


DOMAIN 


1221 


1257 


EGF-LIKE 31, CALCIUM-BINDlNG (POTENTIAL). 


FT 


DISULFID 838 853 BY SIMILARITY. 


FT 


DOMAIN 


1259 


1295 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 855 864 BY SIMILARITY. 


FT 


DOMAIN 


1297 


1335 


EGF-LIKE 33. 






FT 


DOMAIN 


1337 


1373 


EGF-LIKE 34. 


Note: remainder of annotations omitted. 


FT 


DOMAIN 


1375 


1412 


EGF-LIKE 35. 






FT 


DOMAIN 


1415 


1451 


EGF-LIKE 36. 


Query Match 10.24; Score 557; DB 1; Length 2703; 


FT 


DOMAIN 


1475 


1593 


3 X LIN/NOTCH REPEATS. 


B( 


st Local Similarity 38.1%; Pred. No. 7.95e-97; 


FT 


REPEAT 


1475 


1513 


LIN/NOTCH 1, 


Matches 96; Conservative 49; Mismatches 80; Indels 27; GaDs 17; 


FT 


REPEAT 


1514 


1553 


LIN/NOTCH 2. 






FT 


REPEAT 


1554 


1593 


LIN/NOTCH 3, 


Db 


681 CHSNPCNNGATC-IDGINSYKCQCVPGFTGQHGEKNVDECISSPCANNGVC-IDQVNGYK 738 


FT 


DOMAIN 


1896 


2109 


6 X ANK MOTIF REPEATS. 




1 -IM 1 I : 1 1 I III 1 III: :| 1 :||| ||: 1 : 1 :: 


FT 


DOMAIN 


2538 


2568 


POLY-GLN (OPA-REPEAT). 


Qy 


183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFN 242 




DISULFID 


62 


73 


BY SIMILARITY. 






■ 


DISULFID 


67 


83 


BY SIMILARITY. 


Db 


739 CECPRGFYDAHCLSDVDECA-S---NP-CVNEGR-CEDGI-N-E-F-I---CHCPPGY 783 


w 


DISULFID 


85 


94 


BY SIMILARITY. 




11:11 1 -1:1 1 1 1 1 : 1 1 : : 1 : 1 1 1 II 1 


FT 


DISULFID 


100 


111 


BY SIMILARITY. 


Qy 


243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEY 302 


FT 


DISULFID 


105 


124 


BY SIMILARITY. 






FT 


DISULFID 


126 


135 


BY SIMILARITY. 


Db 


784 TGKRCELDIDECSS- -NPCQHGGTCYDKLNAFSCQCMPGYTGQKCETNIDDCVTNPCGNG 841 


FT 


DISULFID 


143 


154 


BY SIMILARITY. 




!|:N :: |: |||:: I I ::|| I ||:|| :|||||||| I 1 


FT 


DISULFID 


148 


' 164 


BY SIMILARITY. 


Qy 


303 EGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNG 362 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 






FT 


DISULFID 


181 


192 


BY SIMILARITY. 


Db 


842 GTCIDKVNGYKCVCKVPFTGRDCE-SKMD-PCA-SNRCK-N--EAKCTPSSNFLDFSC 893 


FT 


DISULFID 


186 


203 


BY SIMILARITY. 




|:|:| : :l l:|: ::|: II ::|| :: 1 : :: 1 :| 1 ||:| 


FT 


DISULFID 


205 


214 


BY SIMILARITY. 


Qy 


363 GSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTC 422 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 




FT 


DISULFID 


226 


241 


BY SIMILARITY. 


Db 


894 TCKLGYTGRYCD 905 


FT 


DISULFID 


243 


252 


BY SIMILARITY. 




1 !::! II 


FT 


DISULFID 


259 


270 


BY SIMILARITY. 


Qy 


423 RCHEGFSGPSCD 434 


FT 


DISULFID 


264 


279 


BY SIMILARITY. 




FT 


DISULFID 


281 


290 


BY SIMILARITY. 






FT 


DISULFID 


297 


308 


BY SIMILARITY. 


RESULT 5 


FT 


DISULFID 


302 


317 


BY SIMILARITY. 


ID 


NOTC BRARE STANDARD; PRT; 2437 AA. 


FT 


DISULFID 


319 


328 


BY SIMILARITY. 


AC 


P46530; 


FT 


DISULFID 


335 


349 


BY SIMILARITY. 


DT 


01-NOV-1995 (REL. 32, CREATED) 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 


DT 


01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 


FT 


DISULFID 


360 


369 


BY SIMILARITY. 


DT 


15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 


FT 


DISULFID 


376 


387 


BY SIMILARITY. 


DE 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN PRECURSOR. 


FT 


DISULFID 


381 


396 


BY SIMILARITY. 


GN 


NOTCH. 


FT 


DISULFID 


398 


407 


BY SIMILARITY. 


OS 


BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO) . 


FT 


DISULFID 


413 


424 


BY SIMILARITY. 


OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 




DISULFID 


418 


435 


BY SIMILARITY. 


OC 


TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 


■ 


DISULFID 


437 


446 


BY SIMILARITY. 


OC 


CYPRINIDAE; RASBORINAE; DANIO. 


w 


DISULFID 


453 


465 


BY SIMILARITY. 


RN- 


[1] 


FT 


DISULFID 


459 


474 


BY SIMILARITY. 


RP 


SEQUENCE FROM N.A. 


FT 


DISULFID 


476 


485 


BY SIMILARITY. 


RC 


TISSUE- EMBRYO; 


FT 


DISULFID 


492 


503 


BY SIMILARITY. 


RX 


MEDLINE; 94128602. 


FT 


DISULFID 


497 


512 


BY SIMILARITY. 


RA 


BIERKAMP C, CAMPOS -ORTEGA J. A.; 


FT 


DISULFID 


514 


523 


BY SIMILARITY. 


RT 


"A zebrafish homologue of the Drosophila neurogenic gene Notch and 


FT 


DISULFID 


530 


541 


BY SIMILARITY. 


RT 


its pattern of transcription during early embryogenesis,"; 


FT 


DISULFID 


535 


550 


BY SIMILARITY. 


RL 


MECH. DEV. 43:87-100(1993). 


FT 


DISULFID 


552 


561 


BY SIMILARITY. 


CC 


•!• FUNCTION: IMPLICATED IN CELL FATE SPECIFICATIONS DURING 


FT 


DISULFID 


568 


579 


BY SIMILARITY. 


CC 


EMBRYO DEVELOPMENT. MAY BE INVOLVED IN THE FORMATION OF THE 


FT 


DISULFID 


573 


588 


BY SIMILARITY. 


CC 


NEURAL PLATE, NOTOCHORD AND BRAIN VESICLES. 


FT 


DISULFID 


590 


599 


BY SIMILARITY. 


CC 


-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 


FT 


DISULFID 


606 


616 


BY SIMILARITY. 


CC 


-!- DEVELOPMENTAL STAGE: EXPRESSED IN ALL CELLS IN PREGASTRULATION 


FT 


DISULFID 


611 


625 


BY SIMILARITY. 


CC 


STAGES, DURING GASTROLATION IS DIFFERENTIALLY EXPRESSED, 


FT 


DISULFID 


627 


636 


BY SIMILARITY. 


CC 


ACCUMULATING PREDOMINANTLY IN THE PRECHORDAL MESODERM AND 


FT 


DISULFID 


643 


654 


BY SIMILARITY. 


CC 


NOTOCHORD, AT THE END OF GASTRULATION, EXPRESSED ALONG THE 


FT 


DISULFID 


648 


663 


BY SIMILARITY. 


CC 


ANTERIOR- POSTERIOR AXIS INCLUDING THE DEVELOPING NEURAL PLATE 


FT 


DISULFID 


665 


674 


BY SIMILARITY. 


CC 


AND DIFFERENTIATING MESODERM. ALSO PRESENT IN THE DEVELOPING 


FT 


DISULFID 


681 


692 


BY SIMILARITY. 


CC 


BRAIN AND HEAD REGIONS. 


FT 


DISULFID 


686 


701 


BY SIMILARITY. 


CC 


-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS, 


FT 


DISULFID 


703 


712 


BY SIMILARITY. 


CC 


-!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 


FT 


DISULFID 


719 


730 


BY SIMILARITY. 


CC 


•!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 
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cc 


-!- SIMILARITY: CONTAINS 6 ANK REPEATS. 


FT 


DOMAIN 


2265 


2276 


POLY-GLN (OPA-REPEAT). 


cc 










FT 


DISULFID 


25 


35 


BY SIMILARITY. 


cc 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


FT 


DISULFID 


29 


45 


BY SIMILARITY. 


cc 


between 


the Swis 


s Institute of Bioinformatics and the EMBL outstation - 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 


cc 


the Euro 


Dean Bioinformatics Institute. There are no restrictions on its 


FT 


DISULFID 


62 


73 


BY SIMILARITY. 


cc 


use by 


non-profit institutions as long as its content is in no way 


FT 


DISULFID 


67 


86 


BY SIMILARITY, 


cc 


modified and this statement is not removed, Osage by and for commercial 


FT. 


DISULFID 


88 


97 


BY SIMILARITY, 


cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


FT 


DISULFID 


105 


116 


BY SIMILARITY. 


cc 


or send an email to licenseSisb-sib.ch). 


FT 


DISULFID 


110 


126 


BY SIMILARITY. 


cc 










FT 


DISULFID 


128 


137 


BY SIMILARITY. 


DR 


BMBL; X6908B; G433867; 




FT 


DISULFID 


143 


154 


BY SIMILARITY. 


DR 


PROSITE; 


PSQQQIO; ASXJYDROXYL; 23. 


FT 


DISULFID 


148 


163 


BY SIMILARITY. 


DR 


PROSITE; PS00022; EGF_1; 34. 




FT 


DISULFID 


165 


174 


BY SIMILARITY, 


DR 


PROSITE; 


PS01186; EGF_2; 28. 




FT 


DISULFID 


181 


194 


BY SIMILARITY, 


DR 


PROSITE; 


PS01187; 


EGF.CA; 22. 


FT 


DISULFID 


188 


203 


BY SIMILARITY. 


DR 


PFAM; PF 


0008; EGF; 36. 




FT 


DISULFID 


205 


214 


BY SIMILARITY. 


■ 


PFAM; PF0QQ23; ank; 6, 




FT 


DISULFID 


221 


232 


BY SIMILARITY. 


| 


PFAM; PF00Q66; notch; 3. 




FT 


DISULFID 


226 


242 


BY SIMILARITY. 


m 


HSSP; P00740; 1IXA. 




FT 


DISULFID 


244 


253 


BY SIMILARITY. 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


FT 


DISULFID 


260 


271 


BY SIMILARITY. 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


FT 


DISULFID 


265 


280 


BY SIMILARITY. 


FT 


SIGNAL 


1 

21 


20 


POTENTIAL, 


FT 


DISULFID 


282 


291 


BY SIMILARITY. 


FT 


CHAIN 


2437 


NEUROGENIC LOCOS NOTCH HOMOLOG PROTEIN. 


FT 


DISULFID 


298 


311 


BY SIMILARITY. 


FT 


DOMAIN 


21 


1724 


EXTRACELLULAR (POTENTIAL). 


FT 


DISULFID 


305 


320 


BY SIMILARITY, 


FT 


TRANSMEM 


1725 


1747 


POTENTIAL. 


FT 


DISULFID 


322 


331 


BY SIMILARITY, 


FT 


DOMAIN 


1748 


2437 


CYTOPLASMIC (POTENTIAL). 


FT 


DISULFID 


338 


349 


BY SIMILARITY. 


FT 


DOMAIN 


21 


57 


EGF-LIKE 1. 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 


FT 


DOMAIN 


58 


98 


EGF-LIKE 2. 


FT 


DISULFID 


360 


369 


BY SIMILARITY. 


FT 


DOMAIN 


101 


138 


EGF-LIKE 3. 


FT 


DISULFID 


375 


386 


BY SIMILARITY. 


FT 


DOMAIN 


139 


175 


EGF-LIKE 4 . 


FT 


DISULFID 


380 


397 


BY SIMILARITY. 


FT 


DOMAIN 


177 


215 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


399 


408 


BY SIMILARITY. 


FT 


DOMAIN 


217 


254 


EGF-LIKE 6. 


FT 


DISULFID 


415 


428 


BY SIMILARITY. 


FT 


DOMAIN 


256 


292 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


422 


437 


BY SIMILARITY. 


FT 


DOMAIN 


294 


332 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


439 


448 


BY SIMILARITY. 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


455 


466 


BY SIMILARITY. 


FT 


DOMAIN 


371 


409 


EGF-LIKE 10. 


FT 


DISULFID 


460 


475 


BY SIMILARITY, 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM- BINDING (POTENTIAL), 


FT 


DISULFID 


477 


486 


BY SIMILARITY. 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


493 


503 


BY SIMILARITY. 


FT 


DOMAIN 


489 


524 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


498 


512 


BY SIMILARITY. 


FT 


DOMAIN 


526 


562 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


514 


523 


BY SIMILARITY. 


FT 


DOMAIN 


564 


599 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


530 


541 


BY SIMILARITY. 


FT 


DOMAIN 


601 


637 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


535 


550 


BY SIMILARITY. 


FT 


DOMAIN 


639 


674 


EGF-LIKE 17, CALCIUM- BINDING (POTENTIAL), 


FT 


DISULFID 


552 


561 


BY SIMILARITY. 


FT 


DOMAIN 


676 


712 


EGF-LIKE 18, CALCIUM- BINDING (POTENTIAL). 


FT 


DISULFID 


568 


578 


BY SIMILARITY, 




DOMAIN 


714 


749 


EGF-LIKE 19, CALCIUM- BINDING (POTENTIAL). 


FT 


DISULFID 


573 


587 


BY SIMILARITY. 


1 


DOMAIN 


751 


787 


EGF-LIKE 20, CALCIUM- BINDING (POTENTIAL). 


FT 


DISULFID 


589 


598 


BY SIMILARITY. 


f 

FT 


DOMAIN 


789 


825 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


505 


616 


BY SIMILARITY. 


DOMAIN 


827 


865 


EGF-LIKE 22. 


FT 


DISULFID 


610 


625 


BY SIMILARITY. 


FT 


DOMAIN 


867 


903 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


627 


636 


BY SIMILARITY, 


FT 


DOMAIN 


905 


941 


EGF-LIKE 24, CALCIUM- BINDING (POTENTIAL). 


FT 


DISULFID 


643 


653 


BY SIMILARITY. 


FT 


DOMAIN 


943 


979 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


648 


662 


BY SIMILARITY. 


FT 


DOMAIN 


981 


1017 


EGF-LIKE 26, 


FT 


DISULFID 


664 


673 


BY SIMILARITY. 


FT 


DOMAIN 


1019 


1055 


EGF-LIKE 27, CALCIUM- BINDING (POTENTIAL). 


FT 


DISULFID 


680 


691 


BY SIMILARITY. 


FT 


DOMAIN 


1057 


1093 


EGF-LIKE 28, 


FT 


DISULFID 


685 


700 


BY SIMILARITY. 


FT 


DOMAIN 


1095 


1141 


EGF-LIKE 29. 


FT 


DISULFID 


702 


711 


BY SIMILARITY. 


FT 


DOMAIN 


1143 


1179 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


718 


728 


BY SIMILARITY. 


FT 


DOMAIN 


1181 


1217 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


723 


737 


BY SIMILARITY, 


FT 


DOMAIN 


1219 


1263 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


739 


748 


BY SIMILARITY. 


FT 


DOMAIN 


1265 


1303 


EGF-LIKE 33. 


FT 


DISULFID 


755 


766 


BY SIMILARITY. 


FT 


DOMAIN 


1305 


1344 


EGF-LIKE 34. 


FT 


DISULFID 


760 


775 


BY SIMILARITY. 


FT 


DOMAIN 


1346 


1382 


EGF-LIKE 35. 


FT 


DISULFID 


777 


786 


BY SIMILARITY. 


FT 


DOMAIN 


1385 


1423 


EGF-LIKE 36, 


FT 


DISULFID 


793 


804 


BY SIMILARITY. 


FT 


DOMAIN 


1446 


1561 


3 X LIN/NOTCH REPEATS. 


FT 


DISULFID 


798 


813 


BY SIMILARITY. 


FT 


REPEAT 


1446 


1486 


LIN/NOTCH 1. 


FT 


DISULFID 


815 


824 


BY SIMILARITY. 




REPEAT 


1487 


1520 


LIN/NOTCH 2, 


FT 


DISULFID 


831 


842 


BY SIMILARITY. 


FT 


REPEAT 


1521 


1561 


LIN/NOTCH 3. 


FT 


DISULFID 


836 


853 


BY SIMILARITY. 


FT 


DOMAIN 


1861 . 


2074 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


855 


864 


BY SIMILARITY. 


FT 


REPEAT 


1861 


1891 


ANK MOTIF 1. 


FT 


DISULFID 


871 


882 


BY SIMILARITY. 


FT 


REPEAT 


1892 


1940 


ANK MOTIF 1. 


FT 


DISULFID 


876 


891 


BY SIMILARITY. 


FT 


REPEAT 


1941 


1974 


ANK MOTIF 1, 


FT 


DISULFID 


893 


902 


BY SIMILARITY. 


FT 


REPEAT 


1975 


2007 


ANK MOTIF 1. 


FT 


DISULFID 


909 


920 


BY SIMILARITY. 


FT 


REPEAT 


2008 


2040 


ANK MOTIF 1. 


FT 


DISULFID 


914 


929 


BY SIMILARITY. 


FT 


REPEAT 


2041 


2074 


ANK MOTIF 1. 


FT 


DISULFID 


931 


940 


BY SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

^ DISULFID 



947 
952 
969 



958 
967 
978 



1023 1034 

1028 1043 

1045 1054 

1061 1072 

1066 1081 

1083 1092 

1099 1120 

1114 1129 

1131 1140 

1147 1158 

1152 1167 

1169 1178 

1185 1196 

1190 1205 

1207 1216 

1223 1242 

1236 1251 

1253 1262 



BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 



le: remainder of annotations omitted. 



Query Match 10.1%; Score 548; DB 1; Length 2437; 

Best Local Similarity 39.6%; Pred. No. 1.03e-94; 



I I ::llll III:: IIMUM |:||| :|: I :||l : : |: : |: 



Matches 


Db 


528 


Qy 


181 


Db 


585 


Qy 


241 


Db 


630 


Qy 


301 


Db 


686 


Qy 


360 


Db 


735 


Qy 


420 



III hi II ll::h: I III I I I 



"E--N--AY-I- 

I I :: I 



■-CTCPK 629 

I II: 



^NCEINIDDC-KR-KPCDY-GKCIDKING-YECVCEPGYSGSMCNINIDDCALNPC 685 
:H ::: I h :||: Ml III I M ||::|: |: till | 



HNGGTCIDGVNSFTCLC-PD--G-FRDATCL-S-Q-HN-E-CSSNPCIHGSCLD-QINS- 734 

:|||:|:||: 1= III I I : : : : :; : I ::| :| |: | :| 



I I I I :M 



STANDARD; 



PRT; 2524 AA, 



•ULT 6 
NOTC XENLA 
P21783; 

DT 01-MAY-1991 (REL. 18, CREATED) 

DT 01-OCM996 (REL. 34, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG PRECURSOR (XOTCH PROTEIN) . 

GN XOTCH. 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 90385285. 

RA COFFMAN C . , HARRIS W. , KINTNER C . ; 

RT "Xotch, the Xenopus homolog of Drosophila notch."; 

RL SCIENCE 249:1438-1441(1990). 

RN [2] 

RP REVISIONS TO 1759-1782. 

RA KINTNER C; 

RL SUBMITTED (JUN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC ■!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC •!- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS, 



CC -!• SIMILARITY; CONTAINS 36 EGF-LIKE DOMAINS. 
CC -I- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 
CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http;//www, isb-sib.cn/announce/ 
or send an email to license@isb-sib.ch). 

EMBL; M33874; G1364263; -. 

PIR; A35844; A35844. 

PROSITE; PS00010; ASXJYDROXYL; 23. 

PROSITE; PS00022; EGF.1; 34. 

PROSITE; PS01186; EGF.2; 29. 

PROSITE; PS01187; EGF.CA; 21. 

PFAM; PF00008; EGF; 36. 

PFAM; PF00023; ank; 6. 

PFAM; PF00066; notch; 3. 

HSSP; P00740; 1IXA. 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


CHAIN 


20 


2524 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG. 


FT 


DOMAIN 


20 


1728 


EXTRACELLULAR (POTENTIAL) , 


FT 


TRANSMEM 


1729 


1750 


POTENTIAL. 


FT 


DOMAIN 


1751 


2524 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


57 


EGF-LIKE 1. 


FT 


DOMAIN 


58 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


140 


EGF-LIKE 3. 


FT 


DOMAIN 


141 


177 


EGF-LIKE 4. 


FT 


DOMAIN 


179 


215 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


217 


254 


EGF-LIKE 6. 


FT 


DOMAIN 


256 


292 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


294 


332 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


371 


409 


EGF-LIKE 10. 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


489 


525 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


527 


563 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


565 


600 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


602 


638 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


640 


675 


EGF-LIKE 17. 


FT 


DOMAIN 


677 


713 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


715 


750 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


752 


788 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


790 


826 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


828 


866 


EGF-LIKE 22. 


FT 


DOMAIN 


868 


904 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


906 


942 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


944 


980 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


982 


1018 


EGF-LIKE 26. 


FT 


DOMAIN 


1020 


1056 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


1058 


1094 


EGF-LIKE 28. 


FT 


DOMAIN 


1096 


1142 


EGF-LIKE 29. 


FT 


DOMAIN 


1144 


1180 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


1182 


1218 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1220 


1264 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


1266 


1304 


EGF-LIKE 33. 


FT 


DOMAIN 


1306 


1346 


EGF-LIKE 34. 


FT 


DOMAIN 


1347 


1383 


EGF-LIKE 35. 


FT 


DOMAIN 


1386 


1424 


EGF-LIKE 36. 


FT 


DOMAIN 


1441 


1560 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1441 


1478 


LIN/NOTCH 1, 


FT 


REPEAT 


1479 


1520 


LIN/NOTCH 2, 


FT 


REPEAT 


1521 


1560 


LIN/NOTCH 3. 


FT 


DOMAIN 


1871 


2083 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


22 


35 


BY SIMILARITY. 


FT 


DISULFID 


29 


45 


BY SIMILARITY. 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 
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FT DISOLFID 

FT DISULFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISULFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

•DISOLFID 
DISOLFID 
DISOLFID 

FT- DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

•DISOLFID 
DISOLFID 
DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISOLFID 

FT DISULFID 

FT DISOLFID 



62 


74 


BY SIMILARITY. 


68 
89 


87 
98 


BY SIMILARITY. 
BY SIMILARITY. 


106 


117 


BY SIMILARITY. 


111 


128 


BY SIMILARITY. 


130 


139 


BY SIMILARITY. 


145 


156 


BY SIMILARITY. 


150 


165 


BY SIMILARITY. 


167 


176 


BY SIMILARITY. 


183 


194 


BY SIMILARITY. 


188 


203 


BY SIMILARITY. 


205 


214 


BY SIMILARITY. 


221 


232 


BY SIMILARITY. 


226 


242 


BY SIMILARITY, 


244 


253 


BY SIMILARITY. 


260 


271 


BY SIMILARITY. 


265 


280 


BY SIMILARITY. 


282 


291 


BY SIMILARITY. 


298 


311 


BY SIMILARITY. 


305 


320 


BY SIMILARITY. 


322 


331 


BY SIMILARITY, 


338 


349 


BY SIMILARITY, 


343 


358 


BY SIMILARITY. 


360 


369 


BY SIMILARITY, 


375 


386 


BY SIMILARITY. 


380 


397 


BY SIMILARITY. 


399 


408 


BY SIMILARITY. 


415 


428 


BY SIMILARITY. 


422 


437 


BY SIMILARITY. 


439 


448 


BY SIMILARITY. 


455 


466 


BY SIMILARITY. 


460 


475 


BY SIMILARITY. 


477 


486 


BY SIMILARITY. 


493 


504 


BY SIMILARITY. 


498 


513 


BY SIMILARITY, 


515 


524 


BY SIMILARITY. 


531 


542 


BY SIMILARITY, 


536 


551 


BY SIMILARITY, 


553 


562 


BY SIMILARITY. 


569 


579 


BY SIMILARITY. 


574 


588 


BY SIMILARITY, 


590 


599 


BY SIMILARITY. 


606 


617 


BY SIMILARITY. 


611 


626 


BY SIMILARITY. 


628 


637 


BY SIMILARITY. 


644 


654 


BY SIMILARITY. 


649 


663 


BY SIMILARITY. 


665 


674 


BY SIMILARITY. 


681 


692 


BY SIMILARITY. 


686 


701 


BY SIMILARITY. 


703 


712 


BY SIMILARITY. 


719 


729 


BY SIMILARITY. 


724 


738 


BY SIMILARITY. 


740 


749 


BY SIMILARITY. 


756 


767 


BY SIMILARITY. 


761 


776 


BY SIMILARITY. 


778 


787 


BY SIMILARITY. 


794 


805 


BY SIMILARITY. 


799 


814 


BY SIMILARITY, 


816 


825 


BY SIMILARITY. 


832 


843 


. BY SIMILARITY. 


837. 


854 


BY SIMILARITY, 


856 


865 


BY SIMILARITY. 


872 


883 


BY SIMILARITY. 


877 


892 


BY SIMILARITY. 


894 


903 


BY SIMILARITY. 


910 


921 


BY SIMILARITY. 


915 


930 


BY SIMILARITY. 


932 


941 


BY SIMILARITY. 


986 


997 


BY SIMILARITY. 


991 


1006 


BY SIMILARITY. 


1008 


1017 


BY SIMILARITY. 


1024 


1035 


BY SIMILARITY. 



FT 


DISOLFID 


1029 


1044 


BY SIMILARITY, 


FT 


DISOLFID 


1046 


1055 


BY SIMILARITY. 


FT 


DISOLFID 


1062 


1073 


BY SIMILARITY, 


FT 


DISOLFID 


1067 


1082 


BY SIMILARITY. 


FT 


DISOLFID 


1084 


1093 


BY SIMILARITY. 


FT 


DISOLFID 


1100 


1121 


BY SIMILARITY. 


FT 


DISOLFID 


1115 


1130 


BY SIMILARITY. 


FT 


DISULFID 


1132 


1141 


BY SIMILARITY. 


FT 


DISULFID 


1148 


1159 


BY SIMILARITY. 


FT 


DISULFID 


1153 


1168 


BY SIMILARITY. 


FT 


DISULFID 


1170 


1179 


BY SIMILARITY. 


FT 


DISULFID 


1186 


1197 


BY SIMILARITY. 


FT 


DISULFID 


1191 


1206 


BY SIMILARITY. 


FT 


DISULFID 


1208 


1217 


BY SIMILARITY. 


FT 


DISULFID 


1224 


1243 


BY SIMILARITY. 


FT 


DISOLFID 


1237 


1252 


BY SIMILARITY. 


FT 


DISULFID 


1254 


1263 


BY SIMILARITY. 


FT 


DISULFID 


1270 


1283 


BY SIMILARITY. 


FT 


DISULFID 


1275 


1292 


BY SIMILARITY. 


FT 


DISULFID 


1294 


1303 


BY SIMILARITY. 


FT 


DISULFID 


1310 


1321 


BY SIMILARITY. 


FT 


DISOLFID 


1315 


1333 


BY SIMILARITY. 


FT 


DISULFID 


1335 


1344 


BY SIMILARITY, 


FT 


DISULFID 


1351 


1362 


BY SIMILARITY. 


FT 


DISULFID 


1356 


1371 


BY SIMILARITY. 


FT 


DISULFID 


1373 


1382 


BY SIMILARITY, 


FT 


DISULFID 


1390 


1401 


BY SIMILARITY. 


FT 


DISULFID 


1395 


1412 


BY SIMILARITY. 


FT 


DISULFID 


1414 


1423 


BY SIMILARITY. 


FT 


CARBOHYD 


462 


462 


POTENTIAL. 


FT 


CARBOHYD 


887 


887 


POTENTIAL. 



Note: remainder of annotations omitted. 

Query Match 10.0%; Score 542; DB 1; Length 2524; 

Best Local Similarity 37.3%; Pred. No. 2,61e-93; 

Matches 95; Conservative 53; Mismatches 76; Indels 31; Gaps 15; 

Db 415 CSLGAN-PCEHGGRCTNTLGS-FQCNCPQGYAGPRCEIDVNECLSNPCQNDSTC-LDQIG 471 

II I II : : I I : : III I: I :|| ::: I ::|| |::|| : I I 
Qy 180 CDLCLNSPCKNNAICEITSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAG 239 

Db 472 EFQCICMPGYEGLYCETNIDECASNPCLHNGKCID K INEFRCDCP 516 

I I I 1:11 III III:! :: I : MM | || :||||| 

Qy 240 RFNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCP 299 

Db 1 517 TGFSGNLCQHDFDECTST--PCKNGAKCLDGPNSYTCQCTEGFTGRHCEQDINECIPDPC 574 

: I: I: II II I :||: ||:| |: Mil :|| :|::| I 
Qy 300 MEYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVEC 359 

Db 575 HYG-TCKDGIATFTCLCRPGYTGRLCD-NDI-N-ECL-SKPCL-N--G-GQCTDRENG-- 623 

: I :| III :: llllllkl: |: : : I : :| : I |:| :|; 
Qy 360 QNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSD 419 



Db 



624 YICTCPKGTTGVNCE 638 

: I I I :| :|: 
420 FTCKCHEGFSGPSCD 434 



RESOLT 7 ; 

ID DLL1JOOSE STANDARD; PRT; 722 AA, 

AC Q61483; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE OPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION OPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTA1). 

GN DLL1, 

OS MOS MDSCOLOS (MOOSE) . 

OC EOKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIOROGNATHI; MURIDAE; MURINAE; MOS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BALB/C X C57BL/6; TISSUE-EMBRYO; 
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RX MEDLINE; 95401858. 

RA BETTENHAUSEN B, , DE ANGELIS M.H., SIMON D., GDENET' J.*L. , GOSSLER A.; 

RT "Transient and restricted expression during mouse embryogenesis of 

RT Dill, a murine gene closely related to Drosophila Delta."; 

RL DEVELOPMENT 121:2407-2418(1995). 

CC -!- FUNCTION; MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 

CC MAMMALIAN EMBRYOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 

CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- TISSUE SPECIFICITY: IN THE EMBRYO/ EXPRESSED IN THE PARAXIAL 

CC MESODERM AND NERVOUS SYSTEM. EXPRESSED AT HIGH LEVELS IN ADULT 

CC HEART AND AT LOWER LEVELS, IN ADULT LUNG. 

CC -!- DEVELOPMENTAL STAGE: EXPRESSED UNTIL DAY 15 IN THE EMBRYO, 

CC EXPRESSION THEN DECREASES AND INCREASES AGAIN IN THE ADULT. 

CC -!• SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY; TO DROSOPHILA DELTA PROTEIN. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

• modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 

cc 

DR EMBL; X80903; G806570; -. 

DR MGD; MGI: 104659; DLLl. 

DR PR0SITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF_1; 8. 

DR PROSITE; PS01186; EGF_2 ; 8. 

DR PROSITE; PS01187; EGF_CA; 2. 

DR PFAM; PF00008; EGF; 6. 

DR HSSP; P00740; 1IXA, 

KW SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE. 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


722 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


546 


568 


POTENTIAL. 


FT 


DOMAIN ■ 


569 


722 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


225 


253 


EGF-LIKE 1. 


FT 


DOMAIN 


256 


284 


EGF-LIKE 2, 


FT 


DOMAIN 


291 


324 


EGF-LIKE 3. 


FT 


DOMAIN 


331 


362 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL), 


FT 


DOMAIN 


369 


401 


EGF-LIKE 5. 


FT 


DOMAIN 


408 


439 


EGF-LIKE 6. 


FT 


DOMAIN 


446 


477 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


484 


515 


EGF-LIKE 8, 


FT 


DISULFID 


225 


236 


BY SIMILARITY, 


FT 


DISULFID 


229 


242 


BY SIMILARITY. 




DISULFID 


244 


253 


BY SIMILARITY, 


1 


DISULFID 


256 


267 


BY SIMILARITY. 




DISULFID 


262 


273 


BY SIMILARITY, 


FT 


DISULFID 


275 


284 


BY SIMILARITY, 


FT 


DISULFID 


291 


303 


BY SIMILARITY, 


FT 


DISULFID 


297 


313 


BY SIMILARITY. 


FT 


DISULFID 


315 


324 


BY SIMILARITY. 


FT 


DISULFID 


331 


342 


BY SIMILARITY. 


FT 


DISULFID 


336 


351 


BY SIMILARITY, 


FT 


DISULFID 


353 


362 


BY SIMILARITY. 


FT 


DISULFID 


369 


380 


BY SIMILARITY. 


FT 


DISULFID 


374 


390 


BY SIMILARITY, 


FT 


DISULFID 


392 


401 


BY SIMILARITY. 


FT 


DISULFID 


408 


419 


BY SIMILARITY, 


FT 


DISULFID 


413 


428 


BY SIMILARITY, 


FT 


DISULFID 


430 


439 


BY SIMILARITY, 


FT 


DISULFID 


446 


466 


BY SIMILARITY, 


FT 


DISULFID 


468 


477 


BY SIMILARITY, 


FT 


DISULFID 


484 


495 


BY SIMILARITY. 


FT 


DISULFID 


489 


504 


BY SIMILARITY. 


FT 


DISULFID 


506 


515 


BY SIMILARITY. 


FT 


CARBOHYD 


476 


476 


POTENTIAL. 



SQ SEQUENCE 722 AA; 78448 MW; 5A647702 CRC32; 



Query Match 9.74; Score 527; DB 1; Length 722; 

Best Local Similarity 35.6*; Pred. No, 8.40e-90; 

Matches 89; Conservative 54; Mismatches 80; Indels 27; Gaps 20; 

Db 294 HKPCRNGATCTNTGQGSYTCSCRPGYTGANCELEVDECAPSPCKNGASCTDLEDS • FSCT 352 

: I: I I |: |||:| ||: I :|| ::| I III I |:| : : |:| 
Qy 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 353 CPPGFYGKVCE-LSA-M-T-CADGP-CFNGGR-CSD--N--P-D-GGYTCHCPLGFSG 398 

III I II : : : I :| I : I II: I : : :| Ml: : I 
Qy 245 CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 304 

Db 399 FNCEKKMDLCGS-SPCSNGAKCVDLGNSYLCRCQAGFSGRYCEDNVDDCASSPCANGGT 456 

:M I:: I :ll I :||: : II I I :lhl II hill : I 111: 
Qy 305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGS 364 

Db 457 CRDSVNDFSCTCPPGYTGKNCS-APV-SR-CEHA-PCH-N-G-ATC--HQRGQRYMCEC 506 

II:: : I I MM I :|: : : : I : : I : I I : : I I 
Qy 365 CVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTCKC 424 

Db 507 AQGYGGPNCQ 516 

:|::||:|: 
Qy 425 HEGFSGPSCD 434 



RESULT 8 

ID DLLlJUMAN STANDARD; PRT; 723 AA, 

AC 000548; 

DT 15-JUL-1998 (REL. 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTAl). 

GN DLLl, 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N,A. 

RA MANN R.S., GRAY G.E., HENRIQUE D., ISH-HOROWICZ D., 

RA ARTAVANI S - TSAKONAS S.; 

RL SUBMITTED (MAY-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC -I- FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 

CC MAMMALIAN EMBRYOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 

CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 

CC SIMILARITY) . 

CC *!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; AF003522; G2197069; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 



DR. 


PROSITE; 


PS00022; I 


GF_1; 8. 




DR 


PROSITE; PS01186; I 


GF_2; 8. 




DR 


PROSITE; 


PS01187; EGF_CA; 1. 




DR 


PFAM; PF 


00008; EGF 


6. 




DR 


HSSP; P00740; 1IXA. 




KW 


SIGNAL; 


EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE, 


FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


723 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


546 


568 


POTENTIAL, 


FT 


DOMAIN 


569 


723 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


226 


254 


EGF-LIKE 1. 


FT 


DOMAIN 


257 


285 


EGF-LIKE 2. 


FT 


DOMAIN 


292 


325 


EGF-LIKE 3, 
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FT 


DOMAIN 


332 


363 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL). 


FT 


DOMAIN 


370 


402 


EGF-LIKE 5. 


FT 


DOMAIN 


409 


440 


EGF-LIKE 6. 


FT 


DOMAIN 


447 


478 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


485 


516 


EGF-LIKE 8. 


FT 


DISULFID 


226 


237 


BY SIMILARITY. 


FT 


DISULFID 


230 


243 


BY SIMILARITY. 


FT 


DISULFID 


245 


254 


BY SIMILARITY. 


FT 


DISULFID 


257 


268 


BY SIMILARITY. 


FT 


DISULFID 


263 


274 


BY SIMILARITY. 


FT 


DISULFID 


276 


285 


BY SIMILARITY. 


FT 


DISULFID 


292 


304 


BY SIMILARITY. 


FT 


DISULFID 


298 


314 


BY SIMILARITY. 


FT 


DISULFID 


316 


325 


BY SIMILARITY. 


FT 


DISULFID 


332 


343 


BY SIMILARITY, 


FT 


DISULFID 


337 


352 


BY SIMILARITY, 




DISULFID 


354 


363 


BY SIMILARITY, 


■ 


DISULFID 


370 


381 


BY SIMILARITY. 


w 


DISULFID 


375 


391 


BY SIMILARITY. 


FT 


DISULFID 


393 


402 


BY SIMILARITY. 


FT 


DISULFID 


409 


420 


BY SIMILARITY. 


FT 


DISULFID 


414 


429 


BY SIMILARITY. 


FT 


DISULFID 


431 


440 


BY SIMILARITY. 


FT 


DISULFID 


447 


467 


BY SIMILARITY. 


FT 


DISULFID 


469 


478 


BY SIMILARITY, 


FT 


DISULFID 


485 


496 


BY SIMILARITY. 


FT 


DISULFID 


490 


505 


BY SIMILARITY. 


FT 


DISULFID 


507 


516 


BY SIMILARITY. 


FT 


CARBOHYD 


477 


477 


POTENTIAL. 


SQ 


SEQUENCE 


723 AA; 


77956 MW 


A1D48BDB CRC32; 



Query Match 9.7%; Score 529; DB 1; Length 723; 

Best Local Similarity 35.2%; Pred. No, 2.87e-90; 



Matches 


88; Conservative 56; Mismatches 79; Indels 27; Gaps 20 


Db 


295 


HKPCKNGATCTKTGQGSYTCSCRPGYTGATCELGIDECDPSPCKNGGSCTDLENS-YSCT 353 
: llll 1 1 1: MM II: 1 II Ml III 1 ::| : : ::| 
NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFnCY 244 


Qy 


185 


Db 


354 


CPPGFYGKICE--LSA-M-T-CADGP-CFNGGR-CSD— S--P-D-GGYSCRCPVGYSG 399 

1 II 1 II : : : 1 :! 1 : Ml: : : : :M M M 
CNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 3 04' 


Qy 


245 


Db 


400 


FNCEKKIDYCSS--SPCSNGAKCVDLGDAYLCRCQAGFSGRHCDDNVDDCASSPCANGGT 457 
ill Mil: :M 1 :M: : :| 1 1 :||:| :|: Mil : 1 III: 
KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDMKNVECQNGGS 364 


1 


305 




458 


CRDGVNDFSCTCPPGYTGRNCS-APV-SR-CEHA-PCH-N--G-ATC--HERGHGYVCEC 507 
III: Ml llhl: 1 :|: : : :|: : M 1 : : : || 
CVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTCKC 4 24 


Qy 


365 


Db 


508 


ARGYGGPNCQ 517 
|::N:|: 


Qy 


425 


HEGFSGPSCD 434 



RESULT 9 

ID CRB_DROME STANDARD; PRT; 2139 AA. 

AC P10040; 

DT 01-MAR-1989 (REL. 10, CREATED) 

DT 01-MAY-1991 (REL, 18, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CRUMBS PROTEIN PRECURSOR (95F). 

GN CRB, 

OS DROSOPHILA MELANOGASTER (FRUIT FLY), 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN 3 0REGON-R; TISSUE=EMBRYO; 

RX MEDLINE; 90263104. 

RA TEPASS U., THERES C, KNUST E.; 



RT "Crumbs encodes an EGF-like protein expressed on apical membranes of 

RT Drosophila epithelial cells and required for organization of 

RT epithelia."; ■ 

RL CELL 61:787-799(1990). 

RN [2] 

RP SEQUENCE OF 1663-1955 FROM N.A. 

RX MEDLINE; 87218537, 

RA KNUST E, , DIETRICH U., TEPASS U., BREMER K.A., WEIGEL D., 

RA VAESSIN H., CAMPOS -ORTEGA J. A.; 

RT "EGF homologous sequences encoded in the genome of Drosophila 

RT melanogaster, and their relation to neurogenic genes."; 

RL EMBO J. 6:761-766(1987). 

CC -!- FUNCTION: MAY PLAY A ROLE IN THE DEVELOPMENT OF EPITHELIA, 

CC POSSIBLY FOR THE ESTABLISHMENT AND/OR MAINTENANCE OF CELL 

CC POLARITY. IT MAY ACT AS A SIGNAL, 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

CC -!• PTM: PHOSPHORYLATED IN THE CYTOPLASMIC DOMAIN (POTENTIAL). 

CC -!- SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch). 

CC 

DR EMBL; M33753; G552087; ALT SEQ. 

DR EMBL; X05144; E1746; -. 

DR EMBL; X05144; G929536; -. 

DR PIR; B26637; B26637. 

DR PIR; A35672; A35672, 

DR FLYBASE; FBgn0000368; crb. 

DR PROSITE; PS00010; ASX.HYDROXYL; 15, 

DR PROSITE; PS00022; EGF.l; 26. 

DR PROSITE; PS01186; EGF 2; 17, 

DR PROSITE; PS01187; EGF CA; 15. 

DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00054; laminin G; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN; SIGNAL; PHOSPHORYLATION. 



FT 


SIGNAL 


1 


90 




FT 


CHAIN 


91 


2139 


CRUMBS PROTEIN, 


FT 


DOMAIN 


91 


2084 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


2085 


2111 


POTENTIAL, 


FT 


DOMAIN 


2112 


2139 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


267 


303 


EGF-LIKE 1. 


FT 


DOMAIN 


306 


343 


EGF-LIKE 2. 


FT 


DOMAIN 


348 


386 


EGF-LIKE 3. 


FT 


DOMAIN 


388 


425 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


427 


463 


EGF-LIKE 5, 


FT 


DOMAIN 


464 


500 


EGF-LIKE 6. 


FT 


DOMAIN 


501 


532 


EGF-LIKE 7. 


FT 


DOMAIN 


545 


581 


EGF-LIKE 8, 


FT 


DOMAIN 


582 


611 


EGF-LIKE 9. 


FT 


DOMAIN 


609 


646 


EGF-LIKE 10. 


FT 


DOMAIN 


648 


685 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


687 


723 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


725 


761 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


763 


800 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


802 


838 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


840 


902 


EGF-LIKE 16. 


FT 


DOMAIN 


904 


940 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


942 


978 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


980 


1021 


EGF-LIKE 19. 


FT 


DOMAIN 


1207 


1243 


EGF-LIKE 20, 


FT 


DOMAIN 


1481 


1517 


EGF-LIKE 21, 


FT 


DOMAIN 


1759 


1795 


EGF-LIKE 22, 


FT 


DOMAIN 


1797 


1833 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


1835 


1871 


EGF-LIKE 24, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


1874 


1915 


EGF-LIKE 25. 


FT 


DOMAIN 


1915 


1951 


EGF-LIKE 26. 
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FT 


DOMAIN 


1953 


1989 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1991 


2029 


EGF-LIKE 28, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


2030 


2070 


EGF-LIKE 29. 


FT 


DISDLFID 


271 


282 


BY SIMILARITY. 


FT 


DISOLFID 


276 


291 


BY SIMILARITY, 


FT 


DISULPID 


293 


302 


BY SIMILARITY, 


FT 


DISDLFID 


310 


321 


BY SIMILARITY. 


FT 


DISULFID 


315 


331 


BY SIMILARITY. 


FT 


DISOLFID 


333 


342 


BY SIMILARITY. 


FT 


DISULFID 


352 


363 


BY SIMILARITY, 


FT 


DISDLFID 


357 


374 


BY SIMILARITY, 


FT 


DISDLFID 


376 


385 


BY SIMILARITY, 


FT 


DISDLFID 


392 


403 


BY SIMILARITY, 


FT 


DISDLFID 


397 


412 


RY <!TMTI,APTTY 


FT 


DISDLFID 


414 


424 


RV CTMTT.ADTTV 


FT 


DISDLFID 


431 


442 


RY CTMTT.ARTTY 


FT 


DISDLFID 


436 


451 


RY QTMTT.ARTTY 


FT 


DISOLFID 


453 


462 


J>V CTMTT ADTTV 
DI glMlbnAlII , 


FT 


nTffll.PTn 


468 


479 


RV CTMTT 1DTTV 
DI OlMlLAKllI, 


FT 


DTCnTT'Tn 

UlOUbf 1U 


473 


488 


UV CTMTT RDTffV 

DI OlMILAKin. 


FT 
FT 


DISOLFID 


490 


499 


RV C.TMTTAPTTV 
DI DlMlunKll I , 


■ 


DISDLFID 


505 


515 


DV CTMTT STJTTV 




DISDLFID 


509 


520 


RY QTMTr.ARTTY 


If 


DISOLFID 


522 


531 


RY CTMTT.ARTTY 
DI 3iFUunl\lII . 


FT 


DISDLFID 


549 


562 


RY CTMTT ARTTV 
DI OirubrtKlil . 




DISDLFID 


556 


569 


BY SIMILARITY, 


FT 


DISOLFID 


571 


580 


BY SIMILARITY. 


FT 


DISDLFID 


586 


597 


BY SIMILARITY. 


FT 


DISDLFID 


591 


602 


BY SIMILARITY. 


FT 


DISOLFID 


604 


610 


BY SIMILARITY. 


FT 


DISDLFID 


613 


624 


BY SIMILARITY. 


FT 


DISDLFID 


618 


634 


BY SIMILARITY. 


FT 


DISDLFID 


636 


645 


RY CTMTT ARTTV 
DI olflllinnlll , 


FT 


DISOLFID 


652 


664 


RY CTMTT.ARTTY 

DI O-LPUunlMl I , 


FT 


DISOLFID 


659 


673 


UV CTMTT ADTTV 
DI OlHHjtUU.lI . 


FT 




675 


684 


OV CTMTT SDTTV 

di SlMlLiAKIlI . 


FT 


DISOLFID 


691 


702 


DV CTMTT ADTTV 
Di QlHlLAKllI . 


FT 




696 


711 


DV CTMTT &DTTV 
DI DlBlliflKllI. 






713 




DV CTMTT 71DTTV 
DI SlNlliAKill, 


FT 


DISDLFID 


729 


740 


DV CTMTT SDTTV 
01 OlHlliflKllI. 


FT 


nTsrn.PTn 

UlSUuilU 


734 


749 


DV CTMTT TiDTTV 

di alMlLAKlli, 




DISDLFID 






DV CTMTT ftDTTV 
DI OlMlbAKin. 


FT 


mcriT pm 
UlOUbf iu 


767 


77R 


BY SIMILARITY, 


FT 




772 


7«7 


DV CTMTT SDTTV 
DI alMlLAKlll, 


FT 






799 


DV CTMTT SDTTV 

DI oIMHjAKIIY , 


FT 


DISOLFID 


806 


817 


DV CTMTT SDTTV 
DI OlBllinlUll , 


FT 




811 


826 


DV CTMTT SDTTV 

di olHlLAKill. 


FT 


DISOLFID 


828 


837 


DV CTMTT SDTTV 
DI OlMlliAlUlI. 


FT 


DISDLFID 


844 


855 


DV CTMTT SDTTV 
01 olMljjAIUll, 




DISDLFID 


849 


890 


RV CTMTT.ADTTV 
DI dlNlLiftftll I , 




DISOLFID 


892 


901 


RY C.TMTIARTTV 
DI DlMlunKlil, 


W 


DISOLFID 


908 


919 


RV CTMTT SDTTV 
DI DlNlunAll I , 


FT 


DISOLFID 


913 


928 


BY SIMILARITY, 


FT 


DISDLFID 


930 


939 


BY SIMILARITY 


FT 


DISDLFID 


946 


957 


BY SIMILARITY. 


FT 


DISOLFID 


952 


966 


BY SIMILARITY. 


FT 


DISOLFID 


968 


977 


BY SIMILARITY 


FT 


DISOLFID 


984 


995 


BY SIMILARITY. 




DISDLFID 


989 


1009 


RY CTMTT.ARTTY 
01 Dlftlunftll I . 


FT 


DISULFID 


1011 


1020 


BY SIMILARITY, 


FT 


DISULFID 


1211 


1222 


BY SIMILARITY, 


FT 


DISOLFID 


1216 


1231 


RY CTMTT.ARTTV 
DI jlI*UJjni\ll I , 


FT 


DISULFID 


1233 


1242 


RY QTMTT ARTTV 
DI OlPUbflKllI , 


FT 


DISULFID 


1485 


1496 


BY SIMILARITY, 


FT 


DISULFID 


1490 


1505 


BY SIMILARITY. 


FT 


DISULFID 


1507 


1516 


BY SIMILARITY. 


FT 


DISULFID 


1763 


1774 


BY SIMILARITY. 


FT 


DISULFID 


1768 


1783 


BY SIMILARITY. 


FT 


DISULFID 


1785 


1794 


BY SIMILARITY, 


FT 


DISOLFID 


1801 


1812 


BY SIMILARITY. 


FT 


DISOLFID 


1806 


1821 


BY SIMILARITY. 


FT 


DISDLFID 


1823 


1832 


BY SIMILARITY. 


FT 


DISDLFID 


1839 


1850 


BY SIMILARITY. 





nTcriTPTn 

U1SULC ill 




Join 


tJV GTMTT SDTTV 




T1TCTTT TTTH 


1861 


1870 


BY SIMILARITY. 


li 


UlaULf IU 






DV CTMTT ADTTV 

BY blMILAJuIi. 




ftTCTTT t*TH 


1Q01 


1903 


BY SIMILARITY, 


PT 


riTcriT PTn 




loin 


BY SIMILARITY, 


J 


UloULc IU 


1Q1Q 




BY SIMILARITY. 




nTOm PTn 


1924 


1939 


BY SIMILARITY. 


FT 

pi 


UlSUlir ID 


1941 


1950 


DV CTMTT SDTTV 




fiTCTTT VTH 




1968 


BY SIMILARITY. 


PT 


uioubr iu 


1962 




DV CTMTT SDTTV 

DI blMlLAKlll. 


FT 


HTOrTT TTH 






BY SIMILARITY. 


FT 


DISDLFID 


1995 


2008 


BY SIMILARITY. 


FT 


DISOLFID 


2002 


2017 


BY SIMILARITY. 


FT 


DISDLFID 


2019 


2028 


BY SIMILARITY. 


FT 


CARBOHYD 


37 


37 


POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL. 


FT 


CARBOHYD 


198 


198 


POTENTIAL, 


FT 


CARBOHYD 


238 


238 


POTENTIAL. 


FT 


CARBOHYD 


239 


239 


POTENTIAL, 


FT 


CARBOHYD 


336 


336 


POTENTIAL. 


FT 


CARBOHYD 


400 


400 


POTENTIAL. 


FT 


CARBOHYD 


550 


550 


POTENTIAL. 


FT 


CARBOHYD 


565 


565 


POTENTIAL. 


FT, 


CARBOHYD 


736 


736 


POTENTIAL. 


FT 


CARBOHYD 


746 


746 


POTENTIAL. 




CARBOHYD 


860 


860 


POTENTIAL. 


pm 


PSDDAtIVt\ 

LAKBUniD 


884 


884 


POTENTIAL. 


FT 


CARBOHYD 


976 


976 


POTENTIAL. 


FT 


CARBOHYD 


1102 


1102 




FT 


CARBOHYD 


1114 


1114 


POTENTIAL, 


FT 


CARBOHYD 


1138 


1138 


POTENTIAL, 


FT 


CARBOHYD 


1192 


1192 


POTENTIAL. 


FT 


CARBOHYD 


1245 


1245 


POTENTIAL. 


FT 


CARBOHYD 


1255 


1255 


POTENTIAL. 


FT 


CARBOHYD 


1354 


1354 


POTENTIAL. 


FT 


CARBOHYD 


1363 


1363 


POTENTIAL. 


FT 


CARBOHYD 


1441 


1441 


POTENTIAL. 


FT 


CARBOHYD 


1454 


1454 


POTENTIAL. 



Note: remainder of annotations omitted. 



Query Match 9.74; Score 529; DB 1; Length 2139; 

Best Local Similarity 30.54; Pred. No. 2.87e-90; 

Matches 100; Conservative 80; Mismatches 116; Indels 32; Gaps 25; 

Db 747 FTCNC-IPGMRGRICDIDIDDCVGDPCLNGGQCIDOLG-GFRCDCSGTGYEGENCELNID 804 

Hi: :l : : II: : I! I : I ; ; |:|: ; | :|| || 
Qy 166 FTCDSKVPTRLATRCDL— -CLNSPCKNNAICETTSSRKYTCNCT-PGFYGVHCENQID 220 

Db 805 ECLSNPCTNGAKC-LDRVKDYFCDCHNGYKGRNCEQDINECESNPCQYNGNCLERSNITL 863 

I ::|| III:: : I |::|: I I! :|::| :: |: |:|:: : 
Qy 221 ACYGSPCLNNATCKVAQAGRFNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDL--VR- 277 

Db 864 YQMSRITDLPKVFSQPFSFENASGYECVCVPGI IGKNCEININECDS - - NPCSKHGNCND 921 

: :| I I I:| : I II I I ||:|| ::: I III ::|:| 
Qy 278 F--CS-EEL-KNF-QSFQI-N-S-YRCDCPMEYEGKHCEDKLEYCTKRLNPCENNGKCIP 329 

Db 922 GIGTYTCECEPGFEGTHCEINIDECDRYNPCQRG-TCYDQIDDYDCDCDANYGGRNCSV- 979 

1:1:1 I III I :H IN : 1 1 I : I I I I ! I I : I : I I ; 
Qy 330 INGSYSCMCSPGFTGNNCETNIDDC-RNVECQNGGSCVDGILSYDCLCRPGYAGQYCEIP 388 

Db 980 -LL-KGCDQ-NPCLNGGAC-LP-YLINEVTHLYTCTCENGFQGDKCEKTTTLS-MVATSL 1033 

:: : : ::| : || : :: : :|| | :|| | |:: :::::: 
Qy 389 PMMDMEYQKTDACQQS-ACGQGECVASQNSSDFTCKCHEGFSGPSCDRQMSVGFRNPGAY 447 

Db 1034 ISVTTEREEGYDINLQFRTTLPNGVLAF 1061 

::: :| I : :||| |:| : 
Qy 448 LALDPLASDG-TITMTLRTTSKIGILLY 474 



RESULT 10 

ID DLJROME STANDARD; PRT; 880 AA. 
AC P10041; 
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RA 
RT 
RT 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

• 

CC 
CC 
CC 
CC 
CC 
DR 
DR 
DR 
DR 
DR 
DR 
DR 



01-MAR-1989 (REL. 10, C 
01-MAR-1989 (REL. 10, LAST SEQUENCE UPDATE) 
01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 
NEUROGENIC LOCUS DELTA PROTEIN PRECURSOR. 
DL. 

DROSOPHILA MELANOGASTER (FRUIT FLY), 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
[1] 

SEQUENCE FROM N,A, 

VAESSIN H., BREMER K.A., KNUST E., CAMPOS "ORTEGA J. A.; 

"The neurogenic gene Delta of Drosophila melanogaster is expressed in 

neurogenic territories -and encodes a putative transmembrane protein 

with EGF-like repeats."; 

EMBO J. 6:3431-3440(1987). 

[2] 

I SEQUENCE OF 422-621 FROM N,A. 
MEDLINE; 87218537. 

KNUST E., DIETRICH 0,, TEPASS U., BREMER K.A., WEIGEL D., VAESSIN H., 
CAMPOS-ORTEGA J, A,; 

"EGF homologous sequences encoded in the genome of Drosophila 
melanogaster, and their relation to neurogenic genes."; 

EMBO J. 6:761-766(1987). 
13] 

PATTERN OF TRANSCRIPTION. 
MEDLINE; 91209246, 

HAENLIN M., KRAMATSCHEK B., CAMPOS-ORTEGA J.A.; 
"The pattern of transcription of the neurogenic gene Delta of 



DEVELOPMENT 110:905-914(1990). 

-!- FUNCTION: ESSENTIAL FOR PROPER DIFFERENTIATION OF ECTODERM. DL 
IS REQUIRED FOR THE CORRECT SEPARATION OF NEURAL AND EPIDERMAL 
CELL LINEAGES. 

-I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

-!• SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 
OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 
THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 

-!• NOTCH AND SERRATE MAY INTERACT AT THE PROTEIN LEVEL, IT IS 
CONCEIVABLE THAT THE SERRATE AND DELTA PROTEINS MAY COMPETE 
FOR BINDING WITH THE NOTCH PROTEIN, 

-!- SIMILARITY: CONTAINS 9 EGF-LIKE DOMAINS. 

-!- SIMILARITY: TO DROSOPHILA SERRATE PROTEIN. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.chj. 

EMBL; X06289; G7853; -. 
EMBL; X05140; G929563; -. 
PIR; S00670; S00670. 
PIR; A26637; A26637. 
FLYBASE; FBgn0000463; Dl. 
PROSITE; PS00010; ASX HYDROXYL; 3, 
PROSITE; PS00022; EGF.l; 9, 
PROSITE; PS01186; EGF_2; 9, 
PROSITE; PS01187; EGF CA; 2. 
PFAM; PF00008; EGF; 8. 
HSSP; P00740; 1IXA, 

DIFFERENTIATION; NEUROGENESIS; REPEAT; TRANSMEMBRANE; 
EGF-LIKE DOMAIN; GLYCOPROTEIN; SIGNAL. 



FT 


SIGNAL 


1 


18 


POTENTIAL, 


FT 


CHAIN 


19 


880 


DELTA PROTEIN. 


FT 


DOMAIN 


19 


653 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


654 


677 


POTENTIAL, 


FT 


DOMAIN 


678 


880 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


227 


258 


EGF-LIKE 1, 


FT 


DOMAIN 


256 


289 


EGF-LIKE 2. 


FT 


DOMAIN 


291 


329 


EGF-LIKE 3. 



FT 


DOMAIN 


331 


372 


EGF-LIKE 4. 


FT 


DOMAIN 


374 


416 


EGF-LIKE 5. 


FT 


DOMAIN 


418 


451 


EG 


F-LIKE 6. 


FT 


DOMAIN 


453 


489 


EG 


F-LIKE 7, CALCIUM-B 


FT 


DOMAIN 


491 


527 


EG 


F-LIKE 8. 


FT 


DOMAIN 


529 


565 


EGF-LIKE 9, CALCIUM-B 


FT 


DISULFID 


231 


240 


BY 


SIMILARITY. 


FT 


DISULFID 


235 


246 


BY 


SIMILARITY, 


FT 


DISULFID 


248 


257 


BY 


SIMILARITY. 


FT 


DISULFID 


260 


271 


BY 


SIMILARITY. 


FT 


DISULFID 


266 


277 


BY 


SIMILARITY. 


FT 


DISULFID 


279 


288 


BY 


SIMILARITY, 


FT 


DISULFID 


295 


307 


BY 


SIMILARITY. 


FT 


DISULFID 


301 


317 


BY 


SIMILARITY. 


FT 


DISULFID 


319 


328 


BY 


SIMILARITY. 


FT 


DISULFID 


335 


348 


BY 


SIMILARITY, 


FT 


DISULFID 


342 


360 


BY 


SIMILARITY, 


FT 


DISULFID 


362 


371 


BY 


SIMILARITY, 


FT 


DISULFID 


378 


388 


BY 


SIMILARITY. 


FT 


DISULFID 


383 


404 


BY 


SIMILARITY, 


FT 


DISULFID 


406 


415 


BY 


SIMILARITY, 


FT 


DISULFID 


422 


433 


BY 


SIMILARITY. 


FT 


DISULFID 


427 


439 


BY 


SIMILARITY. 


FT 


DISULFID 


441 


450 


BY 


SIMILARITY. 


FT 


DISULFID 


457 


468 


BY 


SIMILARITY. 


FT 


DISULFID 


462 


477 


BY 


SIMILARITY. 


FT 


DISULFID 


479 


488 


BY 


SIMILARITY. 


FT 


DISULFID 


495 


506 


BY 


SIMILARITY. 


FT 


DISULFID 


500 


515 


BY 


SIMILARITY. 


FT 


DISULFID 


517 


526 


BY 


SIMILARITY. 


FT 


DISULFID 


533 


544 


BY 


SIMILARITY. 


FT 


DISULFID 


538 


553 


BY 


SIMILARITY, 


FT 


DISULFID 


555 


564 


BY 


SIMILARITY. 


FT 


CARBOHYD 


98 


98 


POTENTIAL. 


FT 


CARBOHYD 


137 


137 


POTENTIAL. 


FT 


CARBOHYD 


167 


167 


POTENTIAL. 


FT 


CARBOHYD 


649 


649 


POTENTIAL. 


FT 


CONFLICT 


437 


438 


GK 


-> ET (IN REF, 2), 


FT 


CONFLICT 


459 


459 


G 


> A (IN REF, 2). 


FT 


CONFLICT 


490 


490 


S 


> T (IN REF. 2). 


SQ 


SEQUENCE 


880 AA; 


94643 MW 


E967E662 CRC32; 



Query Match 9.6%; Score 520; DB 1; Length SBC- 

Best Local Similarity 37.lt; Pred. No. 3.61e-88; 
Matches 95; Conservative 55; Mismatches 77; Indels 29; Gaps 22; 

Db 295 CTNHRPCKNGGTCFNTGEGLYTCKCAPGYSGDDCENEIYSCDADVNPCQNGGTCIDEPHT 354 

I I III! : I I: I M : 1 : 1 1 : I MM :| : :|| I :|| ::: 
Qy 183 CLN-SPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYG--SPCLNNATC-KVAQA 238 

Db 355 KTGYKCHCRNGWSGKMCEEKVLTCSDKPCHQG-ICRN-VR--PG-LGS-KG-Q-GYQCE 405 

"I I :l I II :: I : I I I : II : I : : I :|:|: 
Qy 239 GR-FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCD 297 

Db 406 CPIGYSGPNCDLQLDNCSP- -NPCINGGSCQP-SGK- -CICPSGFSGTRCETNIDDCLGH 460 

II: I I :h h h III I II I :l |:|::||:| MINIM 
Qy 298 C PMEYEGKHCEDKLEYCT KKLNPCENNGKCI P I NG S Y SCMCS PG FTGNNCET N IDDC KNV 357 

Db 461 QCENGGTCIDMVNQYRCQCVPGFHGTHC--SSKVDL-CL-IRPC-AN-G-GTCL-NLNN 511 

:|:IM:|:| : I I MM I I ::::|: :| : I I |: : |: 
Qy 358 ECQNGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNS 417 

Db 512 -DYQCTCRAGFTGKDC 526 

I: I I: 11:1 I 
Qy 418 SDFTCKCHEGFSGPSC 433 



RESULT , 11 

ID NTC1.RAT STANDARD; PRT; 2531 AA, 
AC Q07008; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
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DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR. 

GN NOTCHl. 

OS RATTUS NORVEGICUS (RAT), 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE»SCHWANN CELL; 

RX MEDLINE; 92111383. 

RA WEINMASTER G. , ROBERTS V.J., LEMKE G.; 

RT "A horaolog of Drosophila Notch expressed during mammalian 

RT development."; 

RL DEVELOPMENT 113:199*205(1991). 

CC ■!• FUNCTION: REQUIRED FOR THE CORRECT DIFFERENTIATION OF A NUMBER 
CC OF TISSUES. 

CC •!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

CC -!- DEVELOPMENTAL STAGE: IN THE EMBRYO, HIGHEST LEVELS OCCUR BETWEEN 
CC DAYS 12 AND 14 AND DECREASE RAPIDLY TO MUCH LOWER LEVELS IN THE 
CC ADULT. 

CC •!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CL ■!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 
A -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 
V SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC This SWISS-PROT entry is copyright, it is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www,isb-sib,ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; X57405; G57635; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS00022; EGF 1; 35, 

DR PROSITE; PS01186; EGF J; 26, 

DR PROSITE; PS01187; EGF CA; 21. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF0Q023; ank; 6. 

DR PFAM; PF00066; notch; 3, 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 



FT 


SIGNAL 


1 


18 


POTENTIAL, 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 


FT 


DOMAIN 


19 


1723 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1724 


1746 


POTENTIAL, 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT- 


DOMAIN 


20 


58 


EGF-LIKE 1. 




.DOMAIN 


59 


99 


EGF-LIKE 2. 


1 


| DOMAIN 


102 


139 


EGF-LIKE 3. 




DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17; CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


829 


867 


EGF-LIKE 22, 


FT 


DOMAIN 


869 


905 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


907 


943 


EGF-LIKE 24. 


FT 


DOMAIN 


945 


981 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 



FT 


DOMAIN 


983 


1019 


EGF-LIKE 26. 


FT 


DOMAIN 


1021 


1057 


EGF-LIKE 27, CALCIUM-BINDING 


FT 


DOMAIN 


1059 


1095 


EGF-LIKE 28. 


FT 


DOMAIN 


1097 


1143 


EGF-LIKE 29. 


FT 


DOMAIN 


1145 


1181 


EGF-LIKE 30, CALCIUM-BINDING 


FT 


DOMAIN 


1183 


1219 


EGF-LIKE 31, CALCIUM-BINDING 


FT 


DOMAIN 


1221 


1265 


EGF-LIKE 32, CALCIUM-BINDING 


FT 


DOMAIN 


1267 


1305 


EGF-LIKE 33. 


FT 


DOMAIN 


1307 


1346 


EGF-LIKE 34. 


FT 


DOMAIN 


1348 


1384 


EGF-LIKE 35. 


FT 


DOMAIN 


1387 


1426 


EGF-LIKE 36. 


FT 


DOMAIN 


1449 


1462 


CYS-RICH. 


FT 


DOMAIN 


1865 


2076 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1865 


1910 


ANK MOTIF 1. 


FT 


REPEAT 


1912 


1942 


ANK MOTIF 2. 


FT 


REPEAT 


1944 


1975 


ANK MOTIF 3. 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4 . 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


FT 


REPEAT 


2044 


2076 


ANK MOTIF 6. 


FT 


DISULFID 


24 


37 


BY SIMILARITY. 


FT 


DISULFID 


31 


46 


BY SIMILARITY. 


FT 


DISULFID 


48 


57 


BY SIMILARITY. 


FT 


DISULFID 


63 


74 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY SIMILARITY. 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


FT 


DISULFID 


111 


127 


BY SIMILARITY. 


FT 


DISULFID 


129 


138 


BY SIMILARITY. 


FT 


DISULFID 


144 


155 


BY SIMILARITY. 


FT 


DISULFID 


149 


164 


BY SIMILARITY. 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 


FT 


DISULFID 


182 


195 


BY SIMILARITY, 


FT 


DISULFID 


189 


204 


BY SIMILARITY. 


FT 


DISULFID 


206 


215 


BY SIMILARITY. 


FT 


DISULFID 


222 


233 


BY SIMILARITY. 


FT 


DISULFID 


227 


243 


BY SIMILARITY. 


FT 


DISULFID 


245 


254 


BY SIMILARITY. 


FT 


DISULFID 


261 


272 


BY SIMILARITY. 


FT 


DISULFID 


266 


281 


BY SIMILARITY. 


FT 


DISULFID 


283 


292 


BY SIMILARITY. 


FT 


DISULFID 


299 


312 


BY SIMILARITY. 


FT 


DISULFID 


306 


321 


BY SIMILARITY. 


FT 


DISULFID 


323 


332 


BY SIMILARITY, 


FT 


DISULFID 


339 


350 


BY SIMILARITY, 


FT 


DISULFID 


344 


359 


BY SIMILARITY, 


FT 


DISULFID 


361 


370 


BY SIMILARITY, 


FT 


DISULFID 


376 


387 


BY SIMILARITY, 


FT 


DISULFID 


381 


398 


BY SIMILARITY, 


FT 


DISULFID 


400 


409 


BY SIMILARITY. 


FT 


DISULFID 


416 


429 


BY SIMILARITY. 


FT 


DISULFID 


423 


438 


BY SIMILARITY, 


FT 


DISULFID 


440 


449 


BY SIMILARITY. 


FT 


DISULFID 


456 


467 


BY SIMILARITY, 


FT 


DISULFID 


461 


476 


BY SIMILARITY. 


FT 


DISULFID 


478 


487 


BY SIMILARITY. 


FT 


DISULFID 


494 


505 


BY SIMILARITY. 


FT 


DISULFID 


499 


514 


BY SIMILARITY. 


FT 


DISULFID 


516 


525 


BY SIMILARITY. 


FT 


DISULFID 


532 


543 


BY SIMILARITY, 


FT 


DISULFID 


537 


552 


BY SIMILARITY, 


FT 


DISULFID 


554 


563 


BY SIMILARITY. 


FT 


DISULFID 


570 


580 


BY SIMILARITY. 


FT 


DISULFID 


575 


589 


BY SIMILARITY. 


FT' 


DISULFID 


591 


600 


BY SIMILARITY. 


FT 


DISULFID 


607 


618 


BY SIMILARITY. 


FT 


DISULFID 


612 


627 


BY SIMILARITY. 


FT 


DISULFID 


629 


638 


BY SIMILARITY. 


FT 


DISULFID 


645 


655 


BY SIMILARITY. 


FT 


DISULFID 


650 


664 


BY SIMILARITY. 


FT 


DISULFID 


666 


675 


BY SIMILARITY. 


FT 


DISULFID 


682 


693 


BY SIMILARITY. 


FT 


DISULFID 


687 


702 


BY SIMILARITY, 


FT 


DISULFID 


704 


713 


BY SIMILARITY. 
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FT 


DISULFID 


720 


730 


BY SIMILARITY 


FT 


DISULFID 


725 


739 


BY SIMILARITY 


FT 


DISULFID 


741 


750 


BY SIMILARITY 


FT 


DISULFID 


757 


768 


BY SIMILARITY 


FT 


DISULFID 


762 


777 


BY SIMILARITY 


FT 


DISULFID 


779 


788 


BY SIMILARITY 


FT 


DISULFID 


795 


806 


BY SIMILARITY 


FT 


DISULFID 


800 


815 


BY SIMILARITY 


FT 


DISULFID 


817 


826 


BY SIMILARITY 


FT 


DISULFID 


833 


844 


BY SIMILARITY 


FT 


DISULFID 


838 


855 


BY SIMILARITY 


FT 


DISULFID 


857 


866 


BY SIMILARITY 


FT 


DISULFID 


873 


884 


BY SIMILARITY 


FT 


DISULFID 


878 


893 


BY SIMILARITY 


FT 


DISULFID 


895 


904 


BY SIMILARITY 


FT 


DISULFID 


911 


922 


BY SIMILARITY 




DISULFID 


916 


931 


BY SIMILARITY 


■ 


DISULFID 


933 


942 


BY SIMILARITY 


w 


DISULFID 


987 


998 


BY SIMILARITY 


FT 


DISULFID 


992 


1007 


BY SIMILARITY 


FT 


DISULFID 


1009 


1018 


BY SIMILARITY 


FT 


DISULFID 


1025 


1036 


BY SIMILARITY 


FT 


DISULFID 


1030 


1045 


BY SIMILARITY 


FT 


DISULFID 


1047 


1056 


BY SIMILARITY 


FT 


DISULFID 


1063 


1074 


BY SIMILARITY 


FT 


DISULFID 


1068 


1083 


BY SIMILARITY 


FT 


DISULFID 


1085 


1094 


BY SIMILARITY 


FT 


DISULFID 


1101 


1122 


BY SIMILARITY 


FT 


DISULFID 


1116 


1131 


BY SIMILARITY 


FT 


DISULFID 


1133 


1142 


BY SIMILARITY 


FT 


DISULFID 


1149 


1160 


BY SIMILARITY 


FT 


DISULFID 


1154 


1169 


BY SIMILARITY 


FT 


DISULFID 


1171 


1180 


BY SIMILARITY 


FT 


DISULFID 


1187 


1198 


BY SIMILARITY 


FT 


DISULFID 


1192 


1207 


BY SIMILARITY 


FT 


DISULFID 


1209 


1218 


BY SIMILARITY 


FT 


DISULFID 


1225 


1244 


BY SIMILARITY 


FT 


DISULFID 


1238 


1253 


BY SIMILARITY 


FT 


DISULFID 


1255 


1264 


BY SIMILARITY 


FT 


DISULFID 


1271 


1284 


BY SIMILARITY 


FT 


DISULFID 


1276 


1293 


BY SIMILARITY 


FT 


DISULFID 


1295 


1304 


BY SIMILARITY 


FT 


DISULFID 


1311 


1322 


BY SIMILARITY 


FT 


DISULFID 


1316 


1334 


BY SIMILARITY 




DISULFID 


1336 


1345 


BY SIMILARITY 


1 


DISULFID 


1352 


1363 


BY SIMILARITY 




DISULFID 


1357 


1372 


BY SIMILARITY 


FT 


DISULFID 


1374 


1383 


BY SIMILARITY 


FT 


DISULFID 


1391 


1403 


BY SIMILARITY 



Note: remainder of annotations omitted. 

Query Match 9.6%; Score 520; DB 1; Length 2531; 

Best Local Similarity 37.3%; Pred. No. 3 . 61e-88; 

Matches 94; Conservative 48; Mismatches 83; Indels 27; Gaps 15; 

Db 757 CESNPCVNGGTCKDMTS -GYVCTCREGFSGPNCQTNINECASNPCLNQGTC ' IDDVAGYK 814 

I 1:1 :| M Ml I :|: I: I ::|l|| :|| : : : :: 
Qy 183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLWATCPAQAGRFN 242 

Db 815 CNCPLPYTGATCEWLAPCATSPCKNSG V--CKES-EDYESF — S--CVCPTGW 861 

I I : I II : I II 1:1 III :::|| I I II 
Qy 243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEY 302 

Db 862 QGQTCEIDINECVK--SPCRHGASCQNTNGSYRCLCQAGYTGRNCESDIDDCRPNPCHNG 919 

:| II :: II :ll : : I I II I hi :hll lll::INI: hll 
Qy 303 EGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNG 362 

Db 920 GSCTDGVNAAFCDCLPGFQGAFCE-EDI-N-ECA-TNPCQNGA-NCTDCV—DS-YTC 969 

III II: : II II: hll : : I |::|| :| Ml :| :|| 
Qy 363 GSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTC 422 

Db 970 TCPTGFNGIHCE 981 



' I 11:1 I: 
Qy 423 KCHEGFSGPSCD 434 



RESULT 12 

ID NTClJOUSE STANDARD; PRT; 2531 AA. 

AC Q01705; 

DT 01-HOV-1995 (REL. 32, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR (MOTCH PROTEIN) . 

GN NOTCH1 OR MOTCH. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93194170. 

RA FRANCO DEL AMO F„ GENDRON-MAGUIRE M., SWIATEK P.J., JENKINS N.A., 

RA COPELAND N.G., GRIDLEY T.; 

RT "Cloning, analysis, and chromosomal localization of Notch-1, a mouse 

RT homolog of Drosophila Notch."; 

RL GENOMICS 15:259-264(1993). 

RN [2] 

RP SEQUENCE OF 1551-2170 FROM N.A, 

RC TISSUE" EMBRYO; 

RX MEDLINE; 93048835. 

RA FRANCO DEL AMO F., SMITH D.E., SWIATEK P.J., GENDRON-MAGUIRE M., 

RA GREENSPAN R.J., MCMAHON A. P., GRIDLEY T.; 

RT "Expression pattern of Motch, a mouse homolog of Drosophila Notch, 

RT suggests an important role in early postimplantation mouse 

RT development,"; 

RL DEVELOPMENT 115:737-744(1992). 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!• DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC •!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS, 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS, 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch). 

CC 

DR EMBL; Z11886; G288503; ■. 

DR MGD; MGI: 97363; NOTCHl. 

DR PROSITE; PS00010; ASXJYDROXYL; 22, 

DR PROSITE; PS00022; EGF 1; 34, 

DR PROSITE; PS01186; EGFJ; 27. 

DR PROSITE; PS01187; EGF.CA; 21, 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 


FT 


DOMAIN 


19 


1725 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


1726 


1746 


POTENTIAL 1 . 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


24 


1425 


36 X EGF -TYPE REPEATS. 


FT 


DOMAIN 


1449 


1462 


CYS-RICH. 


FT 


DOMAIN 


1445 


1562 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1445 


1480 


LIN/NOTCH 1. 


FT 


REPEAT 


1481 


1522 


LIN/NOTCH 2, 


FT 


REPEAT 


1523 


1562 


LIN/NOTCH 3. 


FT 


DOMAIN 


1865 


2075 


6 X ANK MOTIF REPEATS. 
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FT 


REPEAT 


1865 


1910 




FT 


REPEAT 


1912 


1942 * 


ANK MOTIF 2. 


FT 


REPEAT 


1944 


1975 


ANK MOTIF 3, 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4. 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


FT 


REPEAT 


2044 


2075 


ANK MOTIF 6. 


FT 


CARBOHYD 


888 


888 


POTENTIAL. 


FT 


CARBOHYD 


959 


959 


POTENTIAL. 


FT 


CARBOHYD 


1179 


1179 


POTENTIAL. 


FT 


CARBOHYD 


1241 


1241 


POTENTIAL, 


FT 


CARBOHYD 


1489 


1489 


POTENTIAL. 


FT 


CARBOHYD 


1587 


1587 


POTENTIAL. 


SQ 


SEQUENCE 


2531 


AA; 271312 MW; AD71189B 



Query Match 9.5%; Score 518; DB 1; Length 2531; 

Best Local Similarity 37.31; Pred. No. 1.06e-87; 

Matches 94; Conservative 48; Mismatches 83; Indels 27; Gaps 15; 

Db 757 CESNPCVNGGTCKDMTS-GYVCTCREGFSGPNCQTNINECASNPCLNQGTC-IDDVAGYK 814 

I -I! Ml : I I I I 1 1 I : I : I : I ::|||| :|| : : : :: 
Qy 183 CLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFN 242 

S 815 CNCPLPYTGATCEWLAPCATSPCKNSG V- -CKES-EDYESF- • -S--CVCPTGW 861 

W I I : I II : I I I |:| III :::|| | | || 
Qy 243 CYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEY 302 

Db 862 QGQTCEVDINECVK--SPCRHGASCQNTNGSYRCLCQAGYTGRNCESDIDDCRPNPCHNG 919 

; l II :; I I =11 : : I MM hi '\- ::lllh hll 

Qy 303 EGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECONG 362 

Db 920 GSCTDGINTAFCDCLPGFQGAFCE-EDI-N-ECA-SNPCQNGA-NCTDCV— DS-YTC 969 

III III : II lh I :|| : : I ::;|| : :|| :| :|| 
Qy 363 GSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTC 422 

Db 970 TCPVGFNGIHCE 981 

I M:l h 
Oy 423 KCHEGFSGPSCD 434 



ID DLLlJAT STANDARD; PRT; 714 AA. 

AC P97677; 

DT 01-NOV-1997 (REL, 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTA1). 

GN DLL1. 

OS RATTUS NORVEGICUS (RAT). 

OC , EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 
)(1] 

SEQUENCE FROM N.A. 

RA DISIBIO G., HEBSHI L, BOULTER J., WEINMASTER G.; 

RL SUBMITTED (DEC-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC •!• FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 
CC MAMMALIAN EMBRYOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 
CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 
CC SIMILARITY). 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC •!- SIMILARITY; CONTAINS 8 EGF-LIKE DOMAINS. 

CC -!• SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 

CC 

CC This. SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; U78889; G1699046; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF 1; 8. 



DR 


PROSITE; PS01186; EGF 2; 8. 




DR 


PROSITE; 


PS01187; EGF.CA; 2. 




DR 


PFAM; PFC 


0008; EGF 


; 6. 




DR 


HSSP; P00740; 1IXA. 




KW 


SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE. 


FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


714 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


537 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


538 


560 


POTENTIAL. 


FT 


DOMAIN 


561 


714 


CYTOPLASMIC (POTENTIAL) . 


FT 


DOMAIN 


225 


253 


EGF-LIKE 1. 


FT 


DOMAIN 


256 


284 


EGF-LIKE 2, 


FT 


DOMAIN 


291 


324 


EGF-LIKE 3. 


FT 


DOMAIN 


331 


362 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL). 


FT 


DOMAIN 


369 


401 


EGF-LIKE 5. 


FT 


DOMAIN 


408 


439 


EGF-LIKE 6. 


FT 


DOMAIN 


446 


477 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


484 


515 


EGF-LIKE 8. 


FT 


DISULFID 


225 


236 


BY SIMILARITY. 


ft' 


DISULFID 


229 


242 


BY SIMILARITY, 


FT 


DISULFID 


244 


253 


BY SIMILARITY, 


FT 


DISULFID 


256 


267 


BY SIMILARITY. 


FT 


DISULFID 


262 


273 


BY SIMILARITY. 


FT 


DISULFID 


275 


284 


BY SIMILARITY. 


FT 


DISULFID 


291 


303 


BY SIMILARITY. 


FT 


DISULFID 


297 


313 


BY SIMILARITY. 


FT 


DISULFID 


315 


324 


BY SIMILARITY. 


FT 


DISULFID 


331 


342 


BY SIMILARITY. 


FT 


DISULFID 


336 


351 


BY SIMILARITY. 


FT 


DISULFID 


353 


362 


BY SIMILARITY. 


FT 


DISULFID 


369 


380 


BY SIMILARITY. 


FT 


DISULFID 


374 


390 


BY SIMILARITY. 


FT 


DISULFID 


392 


401 


BY SIMILARITY. 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


FT 


DISULFID 


430 


439 


BY SIMILARITY. 


FT 


DISULFID 


446 


466 


BY SIMILARITY. 


FT 


DISULFID 


468 


477 


BY SIMILARITY. 


FT 


DISULFID 


484 


495 


BY SIMILARITY. 


FT 


DISULFID 


489 


504 


BY SIMILARITY. 


FT 


DISULFID 


506 


515 


BY SIMILARITY. 


FT 


CARBOHYD 


476 


476 


POTENTIAL, 


SQ 


SEQUENCE 


714 AA; 


77378 MW 


604B76D1 CRC32; 



Query Match 9.3%; Score 508; DB 1; Length 714; 

Best Local Similarity 34,4%; Pred. No. 2.25e-85; 

Matches 86; Conservative 57; Mismatches 80; Indels 27; Gaps 20; 

Db 294 HKPCRNGATCTNTGQGSYTCSCRPGYTGANCELEVDECAPSPCRNGGSCTDLEDS - YSCT 352 

: I: I I |: MM ||: | :|| ::| | ||[ | :; | : : :: | 
Qy 185 NSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGRFNCY 244 

Db 353 CPPGFYGKVCE--LSA-M-T-CADGP-CFNGGR-CSD---N--P-D-GGYTCHCPAGFSG 398 

I : = : I :l I : I II: I : : ;| I || : I 

Qy 245 CNKGFEGDYCEOIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPMEYEG 304 

Db 399 FNCEKKIDLCSS--SPCSNGAKCVDLGNSYLCRCQTGFSGRYCEDNVDDCASSPCANGGT 456 

; l I h: I: :|| I :||: : II I I 1 1 : 1 II hill : I III: 
Qy 305 KHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGS 364 

Db 457 CRDSVNDFSCTCPPGYTGRNCS-APV-SR-CEHA-PCH-N--G-ATC--HQRGQRYMCEC 506 

I h: : I I 1 1 1 : 1 : I :|: : : :|: : | : i | : : | | 
Qy 365 CVDGILSYDCLCRPGYAGQYCEIPPMMDMEYQKTDACQQSACGQGECVASQNSSDFTCKC 424 

Db 507 AQGYGGANCQ 516 

:|::h:h 
Qy 425 HEGFSGPSCD 434 



RESULT 14 

ID NTC3JOUSE STANDARD; PI 
AC Q61982; 

DT 01-NOV-1997 (REL. 35, CREATED) 



Tue Jun 1 10:16:33 1999 



US-09-191- 



.-647-9. rsp 
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DT Ol-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH 3 PROTEIN. 

GN NOTCH 3 . 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-ICR X SWISS WEBSTER; 

RX MEDLINE; 95001556. 

RA LARDELLI M, , DALSTRAND J., LENDAHL U.; 

RT "The novel Notch homologue mouse Notch 3 lacks specific epidermal 

RT growth factor-repeats and is expressed in proliferating 

RT neuroepithelial."; 

RL MECH. DEV. 46:123-136(1994). 

*.-!- FUNCTION: NOTCH 1, 2 AND 3 PLAY A COMBINATIONAL ROLE DURING 
VARIOUS CELL FATE DECISIONS AND MORPHOLOGICAL MOVEMENTS IN THE 
DEVELOPING CNS AND PROBABLY OTHER REGIONS OF THE EMBRYO. 

CC -!- TISSUE SPECIFICITY: PROLIFERATING NEUROEPITHELIUM. 

CC -!- DEVELOPMENTAL STAGE: CNS DEVELOPMENT , 

CC -I- SIMILARITY: CONTAINS 34 EGF-LIKE DOMAINS. 

CC •!* SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS, 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; X74760; G483581; •. 

DR MGD; MGI: 99460; NOTCH3, 

DR PROSITE; PS00010; ASX HYDROXYL; 18. 

DR PROSITE; PS00022; EGF.l; 33. 

DR PROSITE; PS01186; EGF_2 ; 27. 

DR PROSITE; PS01187; EGF_CA; 17, 

DR PFAM; PF00008; EGF; 33. 

DR PFAM; PF00023; ank; 6, 

DR PFAM; PF00066; notch; 3, 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

0^ GLYCOPROTEIN. 



1 


DOMAIN 


1 


1643 


EXTRACELLULAR. 




' TRANSMEM 


1644 


1664 


POTENTIAL. 


FT 


DOMAIN 


1665 


2318 


CYTOPLASMIC, 


FT 


DOMAIN 


39 


1374 


34 X EGF-TYPE REPEATS. 


FT 


DOMAIN 


1388 


1503 


3 X LIN/NOTCH REPEATS. 


FT 


DOMAIN 


1784 


1998 


6 X CDC10/SWI6 REPEATS. 


FT 


DOMAIN 


2242 


2261 


PEST. 


FT 


DOMAIN 


39 


78 


EGF-LIKE 1. 


FT 


DOMAIN 


79 


119 


EGF-LIKE 2. 


FT 


DOMAIN 


120 


157 


EGF-LIKE 3. 


FT 


DOMAIN 


159 


196 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


198 


235 


EGF-LIKE 5. 


FT 


DOMAIN 


237 


273 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


275 


313 


EGF-LIKE 7. 


FT 


DOMAIN 


315 


351 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


352 


390 


EGF-LIKE 9. 


FT 


DOMAIN 


392 


430 


EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


432 


468 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


470 


506 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


508 


544 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


546 


581 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


583 


619 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


621 


656 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


658 


694 


EGF-LIKE 17, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


696 


731 


EGF-LIKE 18. 


FT 


DOMAIN 


735 


771 


EGF-LIKE 19, 


FT 


DOMAIN 


772 


809 


EGF-LIKE 20, 


FT 


DOMAIN 


811 


848 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 



FT 


DOMAIN 


850 


886 


EGF-LIKE 22, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


888 


923 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


925 


961 


EGF-LIKE 24, 


FT 


DOMAIN 


963 


999 


EGF-LIKE 25. 


FT 


DOMAIN 


1001 


1035 


EGF-LIKE 26, 


FT 


DOMAIN 


1037 


1083 


EGF-LIKE 27, 


FT 


DOMAIN 


1085 


1121 


EGF-LIKE 28, 


FT 


DOMAIN 


1123 


1159 


EGF-LIKE 29, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1161 


1204 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1206 


1245 


EGF-LIKE 31. 


FT 


DOMAIN 


1247 


1288 


EGF-LIKE 32. 


FT 


DOMAIN 


1290 


1326 


EGF-LIKE 33. 


FT 


DOMAIN 


1336 


1374 


EGF-LIKE 34. 


FT 


REPEAT 


1388 


1428 


LIN/NOTCH 1. 


FT 


REPEAT 


1429 


1467 


LIN/NOTCH 2. 


FT 


REPEAT 


1468 


1503 


LIN/NOTCH 3. 


FT 


REPEAT 


1784 


1816 


CDC10/SWI6 1. 


FT 


REPEAT 


1817 


1865 


CDC10/SWI6 2. 


FT 


REPEAT 


1866 


1898 


CDC10/SWI6 3. 


FT' 


REPEAT 


1899 


1932 


CDC10/SWI6 4. 


FT 


REPEAT 


1933 


1965 


CDC10/SWI6 5. 


FT 


REPEAT 


1966 


1998 


CDC10/SWI6 6. 


FT 


DISULFID 


43 


55 


BY SIMILARITY, 


FT 


DISULFID 


49 


66 


■ BY ^TMTr.ARTTY 


FT 


DISULFID 


68 


77 


RY STMTT.ARTTY 


FT 


DISULFID 


83 


94 


BY SIMILARITY, 


FT 


DISULFID 


88 


107 


BY SIMILARITY, 


FT 


DISULFID 


109 


118 


BY SIMILARITY. 


FT 


DISULFID 


124 


135 


BY SIMILARITY, 


FT 


DISULFID 


129 


145 


BY SIMILARITY. 


FT 


DISULFID 


147 


156 


BY SIMILARITY. 


FT 


DISULFID 


163 


175 


BY SIMILARITY. 


FT 


DISULFID 


169 


184 


BY SIMILARITY, 


FT 


DISULFID 


186 


195 


BY SIMILARITY. 


FT 


DISULFID 


202 


213 


BY SIMILARITY. 


FT 


DISULFID 


207 


223 


BY SIMILARITY. 


FT 


DISULFID 


225 


234 


BY SIMILARITY. 


FT 


DISULFID 


241 


252 


BY SIMILARITY, 


FT 


DISULFID 


246 


261 


BY SIMILARITY, 


FT 


DISULFID 


263 


272 


BY SIMILARITY, 


FT 


DISULFID 


279 


292 


BY SIMILARITY, 


FT 


DISULFID 


286 


301 


BY SIMILARITY. 


FT 


DISULFID 


303 


312 


BY SIMILARITY, 


FT 


DISULFID 


319 


330 • 


BY SIMILARITY, 




DISULFID 


324 


339 


RY <!TMTTARTTV 


FT 


DISULFID 


341 


350 


RY "UMTLARTTY 


FT 


DISULFID 


356 


367 


BY SIMILARITY. 


FT 


DISULFID 


361 


378 


BY SIMILARITY. 


FT 


DISULFID 


380 


389 


RY ^TMTTARTTY 
di oiniljnAlli . 


FT 


DISULFID 


396 


409 


BY SIMILARITY. 


FT 


DISULFID 


403 


418 


BY SIMILARITY. 


FT 


DISULFID 


420 


429 


BY SIMILARITY. 


FT 


DISULFID 


436 


447 


BY SIMILARITY. 


FT 


DISULFID 


441 


456 


BY SIMILARITY, 


FT 


DISULFID 


458 


467 


BY SIMILARITY. 


FT 


DISULFID 


474 


485 


BY SIMILARITY. 


FT 


DISULFID 


479 


494 


BY SIMILARITY, 


FT 


DISULFID 


496 


505 


BY SIMILARITY. 


FT 


DISULFID 


512 


523 


BY SIMILARITY. 


FT 


DISULFID 


517 


532 


BY SIMILARITY. 


FT 


DISULFID 


534 


543 


BY SIMILARITY, 


FT 


DISULFID 


550 


560 


BY SIMILARITY, 


FT 


DISULFID 


555 


569 


BY SIMILARITY. 


FT 


DISULFID 


571 


580 


BY SIMILARITY. 


FT 


DISULFID 


587 


598 


BY SIMILARITY. 


FT 


DISULFID 


592 


607 


BY SIMILARITY. 


FT 


DISULFID 


609 


618 


BY SIMILARITY. 


FT 


DISULFID 


625 


635 


BY SIMILARITY. 


FT 


DISULFID 


630 


644 


BY SIMILARITY. 


FT 


DISULFID 


■ 646 


655 


BY SIMILARITY. 


FT 


DISULFID 


662 


673 


BY SIMILARITY, 


FT 


DISULFID 


667 


682 


BY SIMILARITY, 


FT 


DISULFID 


684 


693 


BY SIMILARITY. 



Tue Jun 1 10:16:33 1999 
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FT 


DISULFID 


700 


710 


BY SIMILARITY 


FT 


DISULFID 


705 


719 


BY SIMILARITY 


FT 


DISULFID 


721 


730 


BY SIMILARITY 


FT 


DISULFID 


739 


750 


BY SIMILARITY 


FT 


DISULFID 


744 


759 


BY SIMILARITY 


FT 


DISULFID 


761 


770 


BY SIMILARITY 


FT 


DISULFID 


776 


787 


BY SIMILARITY 


FT 


DISULFID 


781 


797 


BY SIMILARITY 


FT 


DISULFID 


799 


808 


BY SIMILARITY 


FT 


DISULFID 


815 


827 


BY SIMILARITY 


FT 


DISULFID 


821 


836 


BY SIMILARITY 


FT 


DISULFID 


838 


847 


BY SIMILARITY 


FT 


DISULFID 


854 


865 


BY SIMILARITY 


FT 


DISULFID 


859 


874 


BY SIMILARITY 


FT 


DISULFID 


876 


885 


BY SIMILARITY 


FT 


DISULFID 


892 


902 


BY SIMILARITY 


FT 


DISULFID 


897 


911 


BY SIMILARITY 


FT 


DISULFID 


913 


922 


BY SIMILARITY 


FT 


DISULFID 


929 


940 


BY SIMILARITY 


FT 


DISULFID 


934 


949 


BY SIMILARITY 


FT 


DISULFID 


951 


960 


BY SIMILARITY 




IDISULFID 


967 


978 


BY SIMILARITY 




Idisulfid 


972 


987 


BY SIMILARITY 


IP 


DISULFID 


989 


998 


BY SIMILARITY 


FT 


DISULFID 


1005 


1016 


BY SIMILARITY 


FT 


DISULFID 


1010 


1023 


BY SIMILARITY 


FT 


DISULFID 


1025 


1034 


BY SIMILARITY 


FT 


DISULFID 


1041 


1062 


BY SIMILARITY 


FT 


DISULFID 


1056 


1071 


BY SIMILARITY 


FT 


DISULFID 


1073 


1082 


BY SIMILARITY 


FT 


DISULFID 


1089 


1100 


BY SIMILARITY 


FT 


DISULFID 


1094 


1109 


BY SIMILARITY 


FT 


DISULFID 


1111 


1120 


BY SIMILARITY 


FT 


DISULFID 


1127 


1138 


BY SIMILARITY 


FT 


DISULFID 


1132 


1147 


BY SIMILARITY 


FT 


DISULFID 


1149 


1158 


BY SIMILARITY 


FT 


DISULFID 


1165 


1183 


BY SIMILARITY 


FT 


DISULFID 


1177 


1192 


BY SIMILARITY 


FT 


DISULFID 


1194 


1203 


BY SIMILARITY 


FT 


DISULFID 


1210 


1223 


BY SIMILARITY 


FT 


DISULFID 


1215 


1233 


BY SIMILARITY 


FT 


DISULFID 


1235 


1244 


BY SIMILARITY 


FT 


DISULFID 


1251 


1262 


BY SIMILARITY 


FT 


DISULFID 


1256 


1276 


BY SIMILARITY 


FT 


DISULFID 


1278 


1287 


BY SIMILARITY 


FT 


DISULFID 


1294 


1305 


BY SIMILARITY 


FT 


DISULFID 


1299 


1314 


BY SIMILARITY 


FT 


DISULFID 


1316 


1325 


BY SIMILARITY 


FT' 


DISULFID 


1340 


1351 


BY SIMILARITY 




i DISULFID 


1345 


1362 


BY SIMILARITY 


1 


Idisulfid 


1364 


1373 


BY SIMILARITY 



Note: remainder of annotations omitted, 

Query Match 9.3%; Score 505; DB 1; Length 2318; 

Best Local Similarity 34,74; Pred. No, 1.12e-84; 

Matches 90; Conservative 50; Mismatches 103; Indels 16; Gaps 13; 

Db 965 DPCFSRPCLHGGICNPTHP-GFECTCREGFTGSQCQNPVDWCSQAPCONGGRC-V-QTGA 1021 

I I- II : HI: I : Ml III :|:| Ml :|| I : I I |:| 
Qy 181 DLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAGR 240 

Db 1022 Y-CICPPGWSGRLCDIQSLPCTEAAAQMG— VRLEQLCQEGGKC IDKGRSH - Y - CVCPE 1075 

: I I I I I: I :: : I I I : : I I I : : : : I I 1 1 
Qy 241 FNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCDCPM 300 

Db 1076 GRTGSHCEHEVDPCTAQ--PCQHGGTCRGYMGGYVCECPAGYAGDSCEDNIDECASQPCQ 1133 

I III - II II:: II hi I |::|::|::|| IN : II 
Qy 301 EYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKNVECQ 360 

Db 1134 NGGSCIDLVARYLCSCPPGTLGVLCEI NEDDCDLG * PSLDSGVQCLHN- GTCVD- LVGG - 1189 

llllhl : I I I II I III I: |: I III :: 
Qy 361 NGGSCVDGILSYDCLCRPGYAGQYCEIPPMM-DMEYQKTDACQQSACGQGECVASQNSSD 419 



Db 1190 FRCNCPPGYTGLHCEADIN 1208 

I 1:1 l::l |: ::: 

Qy 420 FTCKCHEGFSGPSCDRQMS 438 

RESULT 15 

ID NTClJUMAN STANDARD; PRT; 2444 AA. 

AC P46531; 

DT 01-HQV-1995 (HEL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL, 33, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1 PRECURSOR (TRANSLOCATION- 

DE ASSOCIATED NOTCH PROTEIN TAN-1) (FRAGMENT) , 

GN N0TCH1 OR TANl. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91347367. 

RA ELLISEN L.W., BIRD J., WEST D.C., SORENG A.L., REYNOLDS T.C., 

RA SMITH S.D., SKLAR J.; 

RT "TAN-1, the human homolog of the Drosophila notch gene, is broken by 

RT chromosomal translocations in T lymphoblastic neoplasms/; 

RL CELL 66:649-661(1991), 

CC -I- FUNCTION: MAY BE IMPORTANT FOR NORMAL LYMPHOCYTE FUNCTION, IN 

CC ALTERED FORM, MAY CONTRIBUTE TO TRANSFORMATION OR PROGRESSION 

CC IN SOME T-CELL NEOPLASMS . 

CC -!■ SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

CC -!■ TISSUE SPECIFICITY: IN FETAL TISSUES MOST ABUNDANT IN SPLEEN, 

CC BRAIN STEM AND LUNG, ALSO PRESENT IN MOST ADULT TISSUES WHERE IT 

CC IS FOUND MAINLY IN LYMPHOID TISSUES, 

CC -I- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS . 

CC -I- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS, 

CC -I- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 

CC -I- SIMILARITY: CONTAINS 6 ANK REPEATS, 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; M73980; G338675; -. 

DR MIM; 190198; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 20, 

DR PROSITE; PS00022; EGF_1; 34. 

DR PROSITE; PS01186; EGF_2; 26. 

DR PROSITE; PS01187; EGF CA; 18. 

DR- PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6, 

DR PFAM; PF00066; notch; 3, 

DR HSSP; P00740; 1IXA, 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


>2444 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1. 


FT 


DOMAIN 


19 


1736 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1737 


1757 


POTENTIAL, 


FT 


DOMAIN 


1758 


>2444 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DOMAIN 


■ 59 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3. 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10. 
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FT 


DOMAIN 


412 


450 


EG 


"-LIKE 11, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


528 


564 


EG 


'■LIKE 14, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


641 


676 


EG 


'-LIKE 17, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


716 


751 


EG 


'-LIKE 19, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20, 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


829 


868 


EGF-LIKE 22. 


FT 


DOMAIN 


870 


906 


EGF-LIKE 23, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


908 


944 


EG 


'-LIKE 24. 


FT 


DOMAIN 


946 


982 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 26. 




DOMAIN 


1022 


1058 


EGF-LIKE 27. 


■ 


DOMAIN 


1060 


1096 


EGF-LIKE 28. 


w 


DOMAIN 


1098 


1144 


EGF-LIKE 29. 


FT 


DOMAIN 


1146 


1182 


EGF-LIKE 30. 


FT 


DOMAIN 


1184 


1220 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1222 


1266 


EG 


'■LIKE 32, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


1268 


1306 


EGF-LIKE 33. 


FT 


DOMAIN 


1308 


1347 


EGF-LIKE 34. 


FT 


DOMAIN 


1349 


1385 


EG 


'-LIKE 35. 


FT 


DOMAIN 


1388 


1427 


EGF-LIKE 36. 


FT 


DOMAIN 


1446 


1563 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1446 


1481 


LIN/NOTCH 1, 


FT 


REPEAT 


1482 


1523 


LIN/NOTCH 2. 


FT 


REPEAT 


1524 


1563 


LIN/NOTCH 3, 


FT 


DOMAIN 


1876 


2087 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1876 


1921 


ANK MOTIF 1, 


FT 


REPEAT 


1923 


1954 


ANK MOTIF 2. 


FT 


REPEAT 


1956 


1987 


ANK MOTIF 3. 


FT 


REPEAT 


1990 


2021 


ANK MOTIF 4. 


FT 


REPEAT 


2023 


2054 


ANK MOTIF 5. 


FT 


REPEAT 


2056 


2087 


ANK MOTIF 6. 


FT 


DOMAIN 


1576 


1579 


POLY-VAL. 


FT 


DOMAIN 


1662 


1665 


POLY-ARG. 


FT 


DOMAIN 


1729 


1732 


POLY -PRO. 


FT 


DOMAIN 


1741 


1744 


POLY-ALA. 


FT 


DOMAIN 


1902 


1905 


POLY-GLU. 


FT 


DOMAIN 


2260 


2263 


POLY-GLY. 


FT 


DOMAIN 


2404 


2407 


POLY-GLN. 




DOMAIN 


2411 


2418 


POLY -PRO. 


■ 


DISULFID 


24 


37 


BY 


SIMILARITY. 


w 


DISULFID 


31 


46 


BY 


SIMILARITY. 


FT 


DISULFID 


48 


57 


BY 


SIMILARITY. 


FT 


DISULFID 


63 


74 


BY 


SIMILARITY. 


FT 


DISULFID 


68 


87 


BY 


SIMILARITY. 


FT 


DISULFID 


89 


98 


BY 


SIMILARITY. 


FT 


DISULFID 


106 


117 


BY 


SIMILARITY. 


FT 


DISULFID 


111 


127 


BY 


SIMILARITY. 


FT 


DISULFID 


129 


138 


BY 


SIMILARITY. 


FT 


DISULFID 


144 


155 


BY 


SIMILARITY. 


FT 


DISULFID 


149 


164 


BY 


SIMILARITY. 


FT 


DISULFID 


166 


175 


BY 


SIMILARITY. 


FT 


DISULFID 


182 


195 


BY 


SIMILARITY. 


FT 


DISULFID 


189 


204 


BY 


SIMILARITY. 


FT 


DISULFID 


206 


215 


BY 


SIMILARITY. 


FT 


DISULFID 


222 


233 


BY 


SIMILARITY. 


FT 


DISULFID 


227 


243 


BY 


SIMILARITY. 


FT 


DISULFID 


245 


254 


BY 


SIMILARITY. 


FT 


DISULFID 


261 


272 


BY 


SIMILARITY. 


FT 


DISULFID 


266 


281 


BY 


SIMILARITY. 


FT 


DISULFID 


283 


292 


BY 


SIMILARITY. 


FT 


DISULFID 


299 


312 


BY 


SIMILARITY. 


FT 


DISULFID 


306 


321 


BY 


SIMILARITY. 


FT 


DISULFID 


323 


332 


BY 


SIMILARITY. 


FT 


DISULFID 


339 


350 


BY 


SIMILARITY. 


FT 


DISULFID 


344 


359 


BY 


SIMILARITY. 


FT 


DISULFID 


361 


370 


BY 


SIMILARITY. 


FT 


DISULFID 


376 


387 


BY 


SIMILARITY. 



FT 


DISULFID 


381 


398 


BY SIMILARITY, 


FT 


DISULFID 


400 


409 


BY SIMILARITY, 


FT 


DISULFID 


416 


429 


BY SIMILARITY. 


FT 


DISULFID 


423 


438 


BY <!TMTTAPTTY 


FT 


DISULFID 


440 


449 


RY <!TMTT,ARTTY 


FT 


DISULFID 


456 


467 


BY SIMILARITY. 


FT 


DISULFID 


461 


476 


BY SIMILARITY. 


FT 


DISULFID 


478 


487 


BY SIMILARITY. 


FT 


DISULFID 


494 


505 


BY SIMILARITY. 


FT 


DISULFID 


499 


514 


BY SIMILARITY. 


FT 


DISULFID 


516 


525 


BY SIMILARITY. 


FT 


DISULFID 


532 


543 


BY SIMILARITY. 


FT 


DISULFID 


537 


552 


BY SIMILARITY. 


FT 


DISULFID 


554 


563 


BY SIMILARITY, 


FT 


DISULFID 


570 


580 


BY SIMILARITY. 


FT 


DISULFID 


575 


589 


BY SIMILARITY. 


FT 


DISULFID 


591 


600 


BY SIMILARITY. 


FT 


DISULFID 


607 


618 


BY SIMILARITY. 


FT 


DISULFID 


612 


627 


BY SIMILARITY. 


FT 


DISULFID 


629 


638 


BY SIMILARITY. 


FT 


DISULFID 


645 


655 


BY SIMILARITY, 


FT 


DISULFID 


650 


664 


BY SIMILARITY. 


FT 


DISULFID 


666 


675 


BY SIMILARITY. 


FT 


DISULFID 


682 


693 


BY SIMILARITY. 


FT 


DISULFID 


687 


702 


BY SIMILARITY, 


FT 


DISULFID 


704 


713 


BY SIMILARITY, 


FT 


DISULFID 


720 


730 


BY SIMILARITY. 


FT 


DISULFID 


725 


739 


BY SIMILARITY. 


FT 


DISULFID 


741 


750 


BY SIMILARITY. 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


FT 


DISULFID 


762 


777 


BY SIMILARITY. 


FT 


DISULFID 


779 


788 


BY SIMILARITY. 


FT 


DISULFID 


795 


806 


BY SIMILARITY. 


FT 


DISULFID 


800 


815 


BY SIMILARITY. 


FT 


DISULFID 


817 


826 


BY SIMILARITY, 


FT 


DISULFID 


833 


844 


BY SIMILARITY, 


FT 


DISULFID 


838 


855 


BY SIMILARITY. 


FT 


DISULFID 


857 


867 


BY SIMILARITY. 


FT 


DISULFID 


874 


885 


BY SIMILARITY, 


FT 


DISULFID 


879 


894 


BY SIMILARITY, 


FT 


DISULFID 


896 


905 


BY SIMILARITY, 


FT 


DISULFID 


912 


923 


BY SIMILARITY. 


FT 


DISULFID 


917 


932 


BY SIMILARITY, 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DISULFID 


988 


999 


BY SIMILARITY. 


FT 


DISULFID 


993 


1008 


BY SIMILARITY. 


FT 


DISULFID 


1010 


1019 


BY SIMILARITY, 


FT 


DISULFID 


1026 


1037 


BY SIMILARITY. 


FT 


DISULFID 


1031 


1046 


BY SIMILARITY. 


FT 


DISULFID 


1048 


1057 


BY SIMILARITY, 


FT 


DISULFID 


1064 


1075 


BY SIMILARITY. 


FT 


DISULFID 


1069 


1084 


BY SIMILARITY, 


FT 


DISULFID 


1086 


1095 


BY SIMILARITY, 


FT 


DISULFID 


1102 


1123 


BY SIMILARITY. 


FT 


DISULFID 


1117 


1132 


BY SIMILARITY. 


FT 


DISULFID 


1134 


1143 


BY SIMILARITY. 


FT 


DISULFID 


1150 


1161 


BY SIMILARITY. 


FT 


DISULFID 


1155 


1170 


BY SIMILARITY. 


FT 


DISULFID 


1172 


1181 


BY SIMILARITY. 


FT 


DISULFID 


1188 


1199 


BY SIMILARITY. 


FT 


DISULFID 


1193 


1208 


BY SIMILARITY. 



Note: remainder of annotations omitted. 



' Query Match 9.34; Score 506; DB 1; Length 2444; 

Best Local Similarity 38,24; Pred. No. 6.56e-85; 
Matches 79; Conservative 43; Mismatches 64; Indels 21; Gaps 11; 

Db 416 CSLGAN-PCEHAGKCINTLGS-FECQCLQGYTGPRCEIDVNECVSNPCQNDATC-LDQIG 472 

II I II : : I I : : I I |: I :|| ::: I ::|| |:||| : I I 
Qy 180 CDLCLNSPCKNNAICETTSSRKYTCNCTPGFYGVHCENQIDACYGSPCLNNATCKVAQAG 239 

Db 473 EFQCMCMPGYEGVHCEVNTDECASSPCLHNGRC - - L DKI-N-E-FQ CECP 517 
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I I I Ml II I 1:1 :| I : |:| I : : I : II |:|| 
240 RFNCYCNKGFEGDYCEKNIDDCVNSKCENGGKCVDLVRFCSEELKNFQSFQINSYRCKP 299 

518 TGFTGHLCQYDVDECAST - ■ PCKNGAKCLDGPNTYTCVCTEGYTGTHCEVDIDECDPDPC 575 

: I I: h II I :||: :|:|:|: |:|| :|| :||:| I 
300 MEYEGKHCEDKLEYCTKKLNPCENNGKCIPINGSYSCMCSPGFTGNNCETNIDDCKKVEC 359 

576 H YG - SCKDGVATFTCLCRPG YTGHHCE 601 

: I II II: :: llllllhl: II 
360 Q NGG SC VDG I LSY DC LCRPG YAGQ YCE 386 



Search completed: Fri May 28 09:12:50 1999 
Job time : 59 sees. 



t 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

|;rch_pp protein - protein database search, using Smith-Waterman algorithm 

in on: Fri May 2 



Tabular output not generated 



09:18:21 1999; MasPar time 8.31 Seconds 

394.061 Million cell updates/sec 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



MJS-09-191-647-10 

(1-154) from US09191647 . pep 

1160 

1 DPLPVHHRCECMLGYTGDNC EDNGILLYNGDNDHIAVELY 154 

PAM 150 
Gap 11 

170751 seqs, 21266608 residues 



Post-processing: Minimum Match 0* 

Listing first 45 summaries 

Database: a-geneseq35 

1: parti 2:part2 3:part3 4:part4 5:part5 6:part6 7:part7 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13:partl3 
14:partl4 15:partl5 16:partl6 17:partl7 18:partl8 
19:partl9 20:part20 21:part21 22:part22 23:part23 
24:part24 25:part25 26:part26 27:part27 28:part28 
29:part29 30:part30 31:part31 32:part32 33:part33 
34 :part34 35:part35 36:part36 37:part37 38:part38 

•39:part39 
dtistics: Mean 29.725; Variance 145.891; scale 0.204 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



SUMMARIES 



NO. 


Score 


Match Length 


DB 


ID 


Description 


Pred, No. 


1 


1026 


88.4 


1534 


30 


W46966 


Amino acid sequence o 


1.17e 


76 


2 


462 


39.8 


1480 


5 


R25079 


Drosophila SLIT prote 


1.72e 


28 


3 


335 


28.9 


520 


25 


W18348 


Proliferation and dif 


5.96e 


18 


4 


335 


28.9 


702 


25 


W18349 


Proliferation and dif 


5.96e 


18 


5 


335 


28.9 


723 


25 


W18353 


Proliferation and dif 


5.96e 


18 


6 


325 


28.0 


727 


21 


W11719 


C-Delta-1 polypeptide 


3.93e 


17 


7 


325 


28.0 


740 


21 


W00876 


C-Delta-1 polypeptide 


3.93e 


17 


8 


323 


27.8 


660 


21 


W11725 


H-Delta-1 polypeptide 


5,73e 


17 


9 


321 


27.7 


722 


21 


W11720 


M-Delta-1 polypeptide 


8,35e 


17 


10 


292 


25.2 


1036 


25 


W18351 


Proliferation and dif 


1.92e 


14 


11 


292 


25.2 


1187 


25 


W18352 


Proliferation and dif 


1.92e 


14 


12 


292 


25,2 


1218 


25 


W18354 


Proliferation and dif 


1.92e 


14 


13 


292 


25.2 


1218 


19 


W05833 


Human Serrate -1 (HJ1) 


1.92e 


14 


14 


292 


25.2 


1218 


29 


W44301 


Human serrate 1. 


1.92e 


14 


15 


290 


25,0 


1193 


19 


W05835 


Chick Serrate. 


2.79e 


14 


16 


287 


24.7 


1208 


28 


W40827 


Human Jagged protein. 


4,88e 


14 



17 


280 


24.1 


2321 36 


W49698 


Human Notch3 protein. 


l.BOe 


13 


18 


278 


24.0 


1872 36 


W68510 


Partial human Notch- 3 


2.61e 


13 


19 


277 


23,9 


1055 29 


W44298 


Human serrate 2 prote 


3.14e 


13 


20 


277 


23.9 


1212 29 


W44299 


Human serrate 2. 


3,14e 


13 


21 


277 


23.9 


1257 19 


W05834 


Human Serrate -2 (HJ2) 


3.14e 


13 


22 


274 


23.6 


612 28 


W39256 


Human partial mature 


5.48e 


13 


23 


274 


23.6 


737 28 


W39257 


Human membrane protei 


5.48e 


13 


24 


271 


23.4 


2707 24 


W27161 


Mouse receptor ME2. 


9.57e 


13 


25 


265 


22.8 


685 37 


W80813 


Nucleotide sequence o 


2.91e 


12 


26 


265 


22.8 


1404 7 


R38304 


Sequence of a serrate 


2.91e 


12 


27 


253 


21.8 


157 21 


W11730 


H-Delta-1 polypeptide 


2.66e 


11 


28 


242 


20.9 


833 6 


R28960 


Delta Dll, 


2,01e 


10 


29 


237 


20.4 


385 10 


R56167 


Neuroendocrine tumor 


5.02e 


10 


30 


228 


19.7 


1572 24 


W27160 


Mouse receptor ME2 re 


2.60e 


09 


31 


227 


19.6 


1257 9 


R46627 


Neurocan core protein 


3. lie 


09 


32 


222 


19.1 


383 10 


R56166 


Neuroendocrine tumor 


7.73e 


09 


33 


219 


18.9 


2409 3 


R12609 


Versican. 


1.33e 


08 


34 


218 


18.8 


473 17 


R86869 


Adhesive protein. 


1.60e 


08 


35 


210 


18.1 


77 6 


R28962 


ELR-11 and -12, 


5.77e 


08 


36 


205 


17.7 


4544 11 


R60517 


Human alpha-2-MR. 


1.67e 


07 


37 


205 


17.7 


4544 9 


R47861 


Alpha 2-Macroglobulin 


1.67e 


07 


38 


192 


16.6 


810 27 


W37500 


Human nel -related pro 


1.71e 


06 


39 


184 


15.9 


2189 1 


R05222 


Antigen GX5401FL enco 


7.06e 


06 


40 


181 


15.6 


179 37 


W75100 


Human secreted protei 


1.20e 


05 


41 


174 


15.0 


816 27 


W37501 


Human nel-related pro 


4.12e 


05 


42 


173 


14.9 


228 30 


W46967 


Amino acid sequence o 


4.91e 


05 


43 


172 


14.8 


240 33 


W64219 


Human secreted protei 


5.85e 


05 


44 


171 


14.7 


1833 14 


R79478 


Mouse LTBP-2. 


6.97e 


05 


45 


165 


14.2 


466 8 


R52562 


Factor VIII, 


1.99e 


04 



RESULT 
ID 
AC 
DT 



1 



Key 

Peptide 



Protein 



W46966 standard; Protein; 1534 AA. 
W46966; 

06- JUL-1998 (first entry) 

Amino acid sequence of a human slit-like polypeptide. 

Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

cancer; antibody. 

Homo sapiens. 

Location/Qualifiers 
1..26 

/note- "signal peptide" 
27.. 1534 
/note- "mature protein" 

J10087699-A. 

07- APR-1998. 

15- JUL-1997; 205351. 

16- JUL-1996; JP-186219. 
(ASAH ) AS AH I KASEI KOGYO KK. 
WPI; 98-267127/24, 

N-PSDB; V16978. 

Human Slit-like protein • useful for diagnosis and treatment of 
brain-specific diseases and cancers 
Disclosure; Pages 31-35; 45pp; Japanese. 
The present sequence represents a novel human slit-like protein (the 
mature protein is claimed in Claim 1). The slit-like polypeptide is 
useful for diagnosis and treatment of brain-specific diseases and 
cancers, Antibodies directed against the protein, or its fragments 
can also be used for diagnosing cancer. 
1534 AA; 



Query Match 88.44; Score 1026; DB 30; Length 1534; 

Best Local Similarity 91.3%; Pred. No. 1.17e-76; 

Matches 137; Conservative 8; Mismatches 2; Indels 3; Gaps 1; 

Db 1068 rcecmpgyagdncsenqddcrdhrcqngaqcmdevnsysclcaegysgqlceipphlpap 1127 

HIM ll:limimi!:||:|lllllt:||||||:||| |||||||||||| 
Qv 8 RCECMLGYTGDNCSENQDDCRDHKCONGAQCVDEVNSYACLCVEGYSGQLCEIPP--AP 64 

Db 1128 kspcegtecqngancvdqgnrpycqclpgfggpecekllsvnfvdrdtylqftdlqnwpr 1187 
:|:|lllllllllllllll:||||||llllllllllllllllllllllllllll|||||| 
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Qy 65 RSSCEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWPR 124 
Db 1188 anitlqvstaedngillyngdndhiavely 1217 

1 1 [ 1 1 1 1 1 1 1 1 1 ; i 1 1 1 1 1 1 1 1 - 1 ' i , i ; < 

Qy 125 ANITLQVSTAEDNGILLYNGDNDHIAVELY 154 



RESULT 2 

ID R25079 standard; Protein; 1480 AA. 

AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss. 

OS Drosophila melanogaster. 

FH Key Location/Qualifiers 

FT peptide 1. .36 

FT /label- signal 

FT domain 7 3.. 294 

n /label= Flank_LRRJlank_l 

/note- "mediates adhesive events" 

Wf domain 295.. 518 

ft /label- Flank-LRR-FlankJ 

FT /note- "mediates adhesive events" 

FT domain 519.. 714 

FT /label- Flank_LRR_Flank_3 

FT /note- "mediates adhesive events" 

FT domain 715.. 910 

FT /label- Flank_LRR_Flank_4 

FT /note- "mediates adhesive events" 

ft region 911.. 1150 

FT /label- Tandem_EGF_like_repeats 

FT /note- "involved in protein-protein interactions" 

FT region 1353.. 1393 

FT /label- 7th_EGF_like_repeat 

FT /note- "involved in receptor -ligand interactions" 

FT region 1394.. 1404 

FT /label- alternative_splice_segment 

FT /note- "developmentally regulated" 

FT region 1405.. 1480 

FT /label- C- terminal region 

PN WO9210518-A. 

PD 25-JUN-1992. 

PF 27-NOV-1991; (J09055. 

PR 07-DEC-1990; US-624135. 

PA (UYYA ) ONIV YALE, 

PI Artavanis-Tsakonas S, Rothberg JM; 

DR WPI; 92-234590/28. 

•N-PSDB; 025811 . f 
SLIT protein and sequence elements for treating 
neuro-degenerative disease - useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 1; Page 84-89; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the coramisural axon pathways . The process 

CC is dependent on the level of SLIT protein expression, it appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules ( TAGONS ) which play a key role in axon outgrowth and 

CC pathfinding. SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes-caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 



CC claimed as are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102, 

SQ Sequence 1480 AA; 

Query Match 39.8%; Score 462; DB 5; Length 1480; 

Best Local Similarity 42.7%; Pred. No. 1.72e-28; 



I hi I: I lh:l III :|:|lll: III :| I I I : hi II h 
HHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCE---IP 61 



Matches 


Db 


1047 


Qy 


6 


Db 


1107 


Qy 


62 


Db 


1167 


Qy 


118 



III :hl lh I II I |::|| 



I: :| 1 1 : 1 : I : I h 1 1 1 1 : 1 : h : 1 : 1 1 1 1 : 



RESULT 3 

ID W18348 standard; protein; 520 AA, 

AC W18348; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta -1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 3; Page 59-61; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 520 AA; 

Query Match 28.9%; Score 335; DB 25; Length 520; 

Best Local Similarity 44.8%; Pred. No. 5.96e-18; 

Matches 43; Conservative 20; Mismatches 30; Indels 3; Gaps 2; 

Db 408 crcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcppgytgrncs-apvsr--c 464 

I I h: :| : III I ||: I I II ::| I Ihh I :| :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 465 ehapchngatcherghryvcecargyggpncqfllp 500 

I : hill I ::l I Ihl hllhh lh 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 4 

ID W18349 standard; protein; 702 AA. 

AC W18349; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression f 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 
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PN W09719172-A1, 

PD 29-MAY-1997 . 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611, 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27, 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 • suppress 

PT proliferation and differentiation of undifferentiated human blood 

pt cells 

PS Claim 4; Page 61-64; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

•cells, such as leukaemia and malignant tumours, and improvement of 
blood formation, e.g. after immunosuppression. 
Sequence 702 AA; 

Query Match 28.9%; Score 335; DB 25; Length 702; 

Best Local Similarity 44.84; Pred, No. 5.96e-18; 

Hatches 43; Conservative 20; Mismatches 30; Indels 3; Gaps 2; 

Db 408 crcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcppgytgrncs-apvsr--c 464 

I I I::! :| :| III I II: I I II ::| I l|:|: I :| :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 465 ehapchngatcherghryvcecargyggpncqfllp 500 

I : hill I -I I Ihl 1 : 1 1 1 1 1 : ||: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 5 

ID W18353 standard; protein; 723 AA. 

AC W18353; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1..21 

•/label- Signal 
•Protein 22., 723 
/label- Differentiation_suppression_protein 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811, 

PR 17-NOV-1995; JP-299611, 

PA (ASAH ) ASAHI KASEI KOGYO KK, 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

DR N-PSDB; T70174, 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 ■ suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 15; Page 77-82; 114pp; Japanese, 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 723 AA; 

Query Match 28.9%; Score 335; DB 25; Length 723; 

Best Local Similarity 44.8%; Pred. No, 5,96e-18; 

Matches 43; Conservative 20; Mismatches 30; Indels 3; Gaps 2; 

Db 429 crcqagfsgrhcddnvddcasspcanggtcrdgvndfsctcppgytgrncs-apvsr--c 485 



I I l:H :l '\ III III: I I II ::l I Ihl: I :| :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC £ 

Db 486 ehapchngatcherghryvcecargyggpncqfllp 521 

I : hill I ::| I ihl hlihh l|: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLtS 104 



RESULT 6 

ID W11719 standard; Protein; 727 AA. 

AC W11719; 

DT 28-APR-1997 (first entry) 

DE C-Delta-1 polypeptide, 

KW C-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy. 



OS 


Gallus sp. 




FH 


Key 


Location/Qualifiers 


FT 


domain 


184.. 228 


FT 




/label- DSL 


FT 


domain 


229. .261 


FT 




/label- EGF1 


FT 


domain 


262.. 292 


FT 




/label- EGF2 


FT 


domain 


293. .332 


FT 




/label- EGF3 


FT 


domain 


333. .370 


FT 




/label- EGF4 


FT' 


domain 


371. .409 


FT 




/label- EGF5 


FT 


domain 


410. .447 


FT 




/label- EGF6 


FT 


domain 


448. .485 


FT 




/label- EGF7 


FT 


domain 


486. .523 


FT 




/label- EGF8 


FT 


domain 


524.. 534 


FT 




/label- EGF9 


FT 


domain 


555.. 579 


FT 




/label- TM 


FT 




/note- "transmembrane domain 



PN WO9701571-A1. 

PD 16-JAN-1997. 

PF 28-JUN-1996; 011178. 

PR 28-JON-1995; US-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) ONIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58897, 

PT New vertebrate Delta protein, DNA and antibodies • for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 2; 135pp; English, 

CC C-delta-1 polypeptide (W11719) is the chick homologue of Drosophila 

CC Delta, a protein that binds to Notch protein. Expression of 

CC C-Delta-1 correlates with onset of neurogenesis. The C-delta-1 

CC amino acid sequence was deduced from a cDNA clone (T58897) obtd, 

CC from chick stage 4-6 embryos. An alternatively spliced variant 

CC (W00876) was also isolated, and mouse (W11720) and human (W11721- 

CC 38) Delta-1 polypeptides have been identified, Delta-1 proteins 

CC can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, and nervous system disorders or to 

CC promote tissue regeneration and repair. 

SQ Sequence 727 AA; 

Query Match 28.0%; Score 325; DB 21; Length 727; 

Best Local Similarity 44.8%; Pred. No. 3,93e-17; 

Matches 43; Conservative 18; Mismatches 32; Indels 3; Gaps 2; 
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Db 436 cqcqagftgrhcddnvddcasfpcvnggtcqdgvndysctcppgyngkncstp-vsr--c 492 

1:1 hll :| :| III 'hi || |:| | ||:| | | :| | 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 493 ehnpchngatchersnryvcecargygglncqfllp 528 

I hill I ::::! 1 1 : 1 |:|| :|: l|: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 7 

ID W00876 standard; Protein; 740 AA, 

AC W00876; 

DT 28-APR-1997 (first entry) 

DE c-Delta-1 polypeptide (alternatively spliced variant). 

KW C-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 



OS 


Gallus sp. 






FH 


Key 


Location/Qualifiers 


KW 


ET 


domain 


184., 228 




■ 




/label- DSL 






domain 


229,. 261 




w 




/label- EGF1 


pm 


FT 


domain 


262.. 292 


FT 


FT 




/label- EGF2 


FT 


FT 


domain 


293. .332 


FT 


FT 




/label- EGF3 


FT 


FT 


domain 


333.. 370 


FT 


FT 




/label- EGF4 


FT 


FT 


domain 


371.. 409 


FT 


FT 




/label- EGF5 


FT 


FT 


domain 


410.. 447 


FT 


FT 




/label- EGF6 


FT 


FT 


domain 


448.. 485 


FT 


FT 




/label- EGF7 


FT 


FT 


domain 


486.. 523 


FT 


FT 




/label- EGF8 


FT 


FT 


domain 


524.. 534 


FT 


FT 




/label- EGF9 


FT 


FT 


domain 


555.. 579 


FT 


FT 




/label- TM 


FT 


FT 




/note- "transmembrane domain" 


FT 


PN 


WO9701571-A1 




FT 


PD 


16-JAN-1997 . 




FT 


PF 


28-JUN-1996; 


B11178. 


FT 


PR 


28-JUN-1995; 


US-000589. 


FT 


PA 


(IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 


FT 


PA 


(UYYA ) UNIV YALE. 


FT 




Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 


FT 


1 


Lewis J; 




FT 




WPI; 97-100159/09. 


FT 


DR 


N-PSDB; T588 


38. 


FT 



PT New vertebrate Delta protein, DNA and antibodies • for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 2; 135pp; English. 

CC c-delta-1 polypeptide (W00876) is the chick homologue of Drosophila 

CC Delta, a protein that binds to Notch protein. Expression of 

CC C-Delta-1 correlates with onset of neurogenesis. The C-delta-1 

CC amino acid sequence was deduced from a cDNA clone (T58898) obtd. 

CC from chick stage 4-6 embryos. A shorter version (W58877) of 

CC C-Delta-1, lacking the 12 C-terminal amino acids of the longer 

CC version, was also isolated, and mouse (W11720) and human (W11721- 

CC 38) Delta-1 polypeptides have been identified, Delta -1 proteins 

CC can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, and nervous system disorders or to 

CC promote tissue regeneration and repair. 

SO Sequence 740 AA; 

Query Match 28.0*; Score 325; DB 21; Length 740; 

Best Local Similarity 44.8%; Pred. No. 3.93e*17; 



Matches 43; Conservative 18; Mismatches 32; Indels 3; Gaps 2; 

Db 436 cqcqagftgrhcddnvddcasfpcvnggtcqdgvndysctcppgyngkncstp-vsr--c 492 

hi hll :l :l III M: I I II hi I Ihl I I :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 493 ehnpchngatchersnryvcecargygglncqfllp 528 

I hill I ::::| Ihl hll :|: lh 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECERLLS 104 



RESULT 8 

■ID W11725 standard; Protein; 660 AA. 
AC W11725; 

DT 28-APR-1997 (first entry) 

DE H-Delta-1 polypeptide (reading frame 1 product). 

kw H-Delta-1; cell proliferation; nervous system disorder; 



Location/Qualifiers 
219.. 221 

/note- "Delta-1 homologous region" 
245,, 246 

/note- "Delta-1 homologous region" 
259.. 428 

/note- "Delta-1 homologous region" 
430.. 434 

/note- "Delta-1 homologous region" 
594., 597 

/note- "Delta-1 homologous region" 
605.. 608 

/note- "Delta-1 homologous region" 
615.. 617 

/note- "Delta-1 homologous region" 
e 32 

/note- "residue corresponds to stop codon in 
H-Delta-1 contig" 
e 40 

/note- "residue corresponds to stop codon in 
H-Delta-1 contig" 
e 41 

/note- "residue corresponds to stop codon in 
H-Delta-1 contig" • 
e 87 

/note- "residue corresponds to stop codon in 
H-Delta-1 contig" 
e 138 

/note- "residue corresponds to stop codon in 
H-Delta-1 contig" 



FT miscjifference 145 



/note- "residue corresponds to stop codon in 
H-Delta-1 contig" 



FT 
FT 

FT miscjifference 162 
FT 
FT 



/note- "residue corresponds to stop codon in 
H-Delta-1 contig" 



FT miscjifference 187 



/note- "residue corresponds to stop codon in 
l-Delta-1 contig" 



FT misc.difference 203 



/note- "undetermined amino acid residue" 



FT 
FT 

FT miscjifference 230 



/note- "residue corresponds to stop codon in 
H-Delta-1 contig" 



FT miscjifference 249 



/note- "undetermined amino acid residue" 



FT 
FT 

FT miscjifference 429 



/note- "residue corresponds to stop codon in 
H-Delta-1 contig" 



FT miscjifference 447 



/note- "undetermined amino acid residue" 
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ft /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

ft misc.difference 498. 

FT /note= "undetermined amino acid residue" 

FT misc.difference 513 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" ' 

FT misc.difference 541 

FT /note- "undetermined amino acid residue" 

FT misc.difference 552 

FT /note= "undetermined amino acid residue" 

FT misc.difference 556 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 564 

FT /note- "residue corresponds to stop codon in 

•H-Delta-1 contig" 
misc.difference 576 
/note- "residue corresponds to stop codon in 
FT H-Delta-1 contig" 

FT misc.difference 580 

FT /note- "undetermined amino acid residue" 

FT misc.difference 619 

FT /note- "undetermined amino acid residue" 

FT misc.difference 621 

FT /note- "undetermined amino acid residue" 

FT misc.difference 626 

FT /note- "undetermined amino acid residue" 

FT misc.difference 630 

FT /note- "undetermined amino acid residue" 

FT misc.difference 634 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 639 

FT /note- "undetermined amino acid residue" 

FT misc.difference 642 

FT /note- "undetermined amino acid residue" 

FT misc.difference 643 

FT /note- "residue corresponds to stop codon in 

FT H-Delta-1 contig" 

FT misc.difference 644 

FT /note- "undetermined amino acid residue" 

FT misc.difference 647 

FT /note- "residue corresponds to stop codon in 

# H-Delta-1 contig" 
misc.difference 648 
/note- "undetermined amino acid residue" 

FT misc.difference 651 

FT /note- "undetermined amino acid residue" 

FT misc.difference 652 

FT /note- "undetermined amino acid residue" 

PN WO9701571-A1, 

PD 16-JAN-1997. 

PF 28-JUN-1996; 011178. 

PR 28-JUN-1995; US-000589. 

PA (DO ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T59454. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Disclosure; Fig 12B1-6; 135pp; English, 

CC Polypeptide sequences (W11725-27) were determined for all 3 

CC reading frames of a human H-delta-1 contig sequence (T59454) obtd. 

CC from a foetal brain library, Errors in the contig sequence meant 

CC that no single reading frame gave the correct sequence for the 

CC H-Delta-1 protein, The 3 polypeptide sequences were therefore 

CC compared to chick and mouse Delta-1 sequences (see also W11719-20) 

CC and regions of homology (see also W11728-38) identified, H-Delta-1 

CC is the human homologue of Drosophila Delta, a protein that binds 



CC to Notch protein. H-Delta-1 polypeptides can be used to treat 

CC disorders of cell fate or differentiation, such as cancer, and 

CC nervous system disorders, or to promote tissue regeneration and 

CC repair. 

SQ Sequence 660 AA; 

Query Match 27,8%; Score 323; DB 21; Length 660; 

Best Local Similarity 43.84; Pred. No. 5.73e-17; 

Matches 42; Conservative 20; Mismatches 31; Indels 3; Gaps 2. 

Db 341 crcqagf sgrhcddnvddcasspcanggtcrdgvndf sctcppgytgrncs -apasr- -c 397 

I I h:l H H III I II: I I II ::| I l|:|: I :||:| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHRCQNGAQCVDEVNSYACLCVEGYSGOLCEIPPAPRSSC 68 

Db 398 ehapchngatcherghryxcecarsyggpncxfllp 433 

I : 1:111 I ::| I |:| ::|||:| ||: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 9 

ID W11720 standard; Protein; 722 AA. 

AC W11720; 

DT 28-APR-1997 (first entry) 

DE H-Delta-1 polypeptide. 

RW M-Delta-1; cell proliferation; nervous system disorder; 

kw tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy, 

OS Mus sp, 

PN W09701571-A1. 

PD 16-JAN-1997. 

PF 28-JIIN-1996; U11178. 

PR 28-JDN-1995; US-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58899. 

PT New vertebrate Delta protein, DNA and antibodies - for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Claim 4; Fig 8; 135pp; English. 

CC M-delta-1 polypeptide (W11720) is the mouse homologue of Drosophila 

CC Delta, a protein that binds to Notch protein. It is expressed 

CC primarily in presomitic mesoderm, the central and peripheral 

CC nervous systems, and kidney. Chick (W11719) and human (W11721- 

CC 38) Delta-1 polypeptides have also been identified. Delta-1 

CC proteins can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, as well as nervous system disorders, 

CC and to promote tissue regeneration and repair, 

SQ Sequence 722 AA; 

Query Match 27.7%; Score 321; DB 21; Length 722; 

Best Local Similarity 43.8*; Pred, No. 8.35e-17; 

Matches 42; Conservative 19; Mismatches 32; Indels 3; Gaps 2 

Db 428 crcqagfsgrycednvddcasspcanggtcrdsvndfsctcppgytgkncs-apvsr--c 484 

I I h:| I :| III I II: I I II ::l I ll:| I :| :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 485 ehapchngatchqrgqrymcecaqgyggpncqfllp 520 

I : hill I ::! I :|:| Ml':'.: ||; 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 10 

ID W18351 standard; protein; 1036 AA. 
AC W18351; 

DT ll-PEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 
KW Proliferation; differentiation; suppression; human; delta-1; 
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KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356, 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611, 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta -1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

pt cells 

PS Claim 5; Page 66-71; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

* blood formation, e.g. after immunosuppression. 
Sequence 1036 AA; 

uuery Match 25,2%; Score 292; DB 25; Length 1036; 

Best Local Similarity 40.0%; Pred. No, 1.92e-14; 

Matches 38; Conservative 18; Mismatches 36; Indels 3; Gaps 2; 

Db 441 rcicppgyagdhcerdidecasnpclngghcqneinrfqclcptgfsgnlcqld-i--dy 497 

II I 1 1 : 1 E : I : l:| : I ||::| :|:| : III |:|| !:: 
0y 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 



498 cepnpcqngaqcynrasdyfckcpedyegkncshl 532 
II HIM I :::| II : I :| I 
Oy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKL 102 



RESULT 11 

ID W18352 standard; protein; 1187 AA. 

AC W18352; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide, 

KW Proliferation; differentiation; suppression; human; delta- 1; 

kw serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens, 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

•17-NOV-1995; JP-299611. 
(AS AH ) ASAHI KASEI KOGYO KK, 
Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 6; Page 71-76; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 1187 AA; 

Query Match 25.24; Score 292; DB 25; Length 1187; 

Best Local Similarity 40,01; Pred. No. 1.92e-14; 

Matches 38; Conservative 18; Mismatches 36; Indels 3; Gaps 2; 

Db 441 rcicppgyagdhcerdidecasnpclngghcqneinrfqclcptgfsgnlcqld-i--dy 497 

II I 1:11: : 1:1 : I ||::| :|:| : III hi II:: 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 



Db 498 cepnpcqngaqcynrasdyfckcpedyegkncshl 532 

II Mill I :::| II : I :| I 
Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKL 102 



RESULT 12 

ID W18354 standard; protein; 1218 AA. 

AC W18354; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression . 



FH Key Location/Qualifiers 

FT Peptide 1. .31 

FT /label- Signal 

FT Protein 32.. 1218 

FT /label- Differentiation„suppression_protein 

PN W09719172-A1. 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

DR N-PSDB; T70175, 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 • suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 15; Page 83-91; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SO Sequence 1218 AA; 

Query Match 25.2%; Score 292; DB 25; Length 1218; 

Best Local Similarity 40,0%; Pred. No. 1.92e-14; 

Matches 38; Conservative 18; Mismatches 36; indels 3; Gaps 2; 

Db 472 rcicppgyagdhcerdidecasnpclngghcqneinrfqclcptgfsgnlcqld-i--dy 528 

II I Ihlhl : hi : I l|::| :|:| : III |:|| II:: 

Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVBEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 529 cepnpcqngaqcynrasdyfckcpedyegkncshl 563 

II Hill I :::| II : I :| I 

Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKL 102 



ID 


W05833 standard; Protein; 1218 AA. 


AC 


W05833; 




DT 


28-JAN-1997 


(first entry) 


DE 


Human Serrate-1 (HJ1) . 


KW 


Serrate-1; human jagged-1; HJ1; Notch; cell differentiation; 


KW 


cell fate; central nervous system; cancer; tissue repair; therapy 


KW 


diagnosis; antibody. 


OS 


Homo sapiens 




FH 


Key 


Location/Qualifiers 


FT 


domain 


1. .1067 


FT 




/label- Extracellular.domain 


FT 


peptide 


14. ,29 


FT 




/label- Sig_peptide 


FT 


domain 


185.. 229 


FT 




/label- DSL 


FT 




/note- "region of homology with Drosophila Delta 


FT 




and Serrate, predicted to mediate binding 


FT 




with Notch" 


FT 


domain 


234., 896 
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FT 




/label- ELR 


FT 




/note= "epidermal growth factor-like repeat domain 


FT 


region 


234. .264 


FT 




/label- ELR1 


FT 


region 


265. .299 


FT 




/label- ELR2 


FT 


region 


300. .339 


FT 




/label- ELR3 


FT 


region 


340. .377 


FT 




/label- ELR4 


FT 


region 


378. .415 


FT 




/label- ELR5 


FT 


region 


416. .453 


FT 




/label= ELR 6 


FT 


region 


454. .490 


FT 




/label- ELR7 




region 


491. .528 


■ 




/label- ELR8 


w 


region 


529. .566 


FT 




/label- ELR9 


FT 


region 


567. .598 


FT 




/label- Partial ELR 


FT 


region 


599. .632 


FT 




/label- Partial ELR 


FT 


region 


633. ,670 


FT 




/label- ELR10 


FT 


region 


671. .708 


FT 




/label- ELR11 


FT 


region 


709.. 747 


FT 




/label- ELR12 


FT 


region 


748. .785 


FT 




/label- ELR13 


FT 


region 


786. .823 


FT 




/label- ELR14 


FT 


region 


824. ,862 


FT 




/label- ELR15 


FT 


region 


863. ,879 


FT 




/label- Partial ELR 


FT 


region 


880. .896 


FT 




/label- Partial ELR 


FT 


domain 


1068., 1089 


FT 




/label- Transmembrane domain 


FT 


domain 


1090.. 1218 


FT 




/label- Intracellular.domain 




WO9627610-A1. 




i 


12-SEP-1996. 





W 07-MAR-1996; U03172. 

PR 07-MAR-1995; DS-400159. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (Um ) [INIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 

PI Lewis JH, Mann RS, Myat AM; 

DR WPI; 96-425379/42, 

DR N-PSDB; T40090. 

pt vertebrate Serrate protein and related DNA - used to treat or 

pt prevent malignancies characterised by increased Notch activity. 

PS Claim 4; Page 95-98; 161pp; English. 

CC Human Serrate-1 (W05833) and human Serrate-2 (W05833) are ligands 

CC for the zygotic neurogenic locus Notch, and are believed to play a 

CC major role in determining cell fates (differentiation) in the 

CC central nervous system. Their amino acid sequences were deduced 

CC from cDNA clones (see also T40090-91) isolated from human foetal 

CC brain cDNA libraries. The proteins, antibodies raised to them, 

CC and encoding nucleic acids can be used in the detection of 

CC Serrate sequences and in the treatment of disorders of cell fate 

CC or differentiation, partic, cancer, nervous system disorders 

CC and in tissue repair or regeneration. 

SQ Sequence 1218 AA; 

Query Match 25.2%; Score 292; DB 19; Length 1218; 

Best Local Similarity 40.0*; Pred. No. 1.92e-14; 
' Matches 38; Conservative 18; Mismatches 36; Indels 3; Gaps 2; 



ieinrfqclcptgfsgnlcqld-i--dy 528 
II I INN : 1:1 : I fh:| :|:| : III I : E J ||:: 
8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

529 cepnpcqngaqcynrasdyfckcpedyegkncshl 563 

II lllll I ::: II : I :| I 
68 CEGTECQNGANCVDQGSRPVCOCLPGFGGPECEKL 102 



RESULT 14 

ID W44301 standard; Protein; 1218 AA. 

AC W44301; 

DT 19- JQN-1998 (first entry) 

DE Human serrate 1. 

KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1. .31 

FT /label- Signal 

FT Protein 32,. 1218 

FT /label- Serrate-1 

PN WO9802458-A1. 

PD 22-JAN-1998. 

PF ll-JOL-1997; J02414. 

PR 14-MAY-1997; JP-124063. 

PR 16-JUL-1996; JP-186220. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15201, 

PT Human serrate-2 gene expression products - used to regulate stem 

PT cell differentiation, useful in treating neoplasms, e.g, leukaemia 

PS Disclosure; Page 77-86; 103pp; Japanese. 

CC The present sequence represents human serrate 1, from the present 

CC invention which describes human serrate 2. The present invention also 

CC describes a method for the preparation of the polypeptides, and 

CC antibodies binding to the polypeptide and its fragments , The polypeptide 

CC and its fragments expressed by the serrate-2-gene can be used to inhibit 

CC stem (especially blood stem) cell differentiation and to inhibit 

CC endothelial cell growth, They may be incorporated in a cell culture 

CC media for culturing undifferentiated stem cells . They can also be used 

CC for treatment of neoplasms such as leukaemia . The antibodies can be used 

CC for the diagnosis of malignant tumours. 

SQ Sequence 1218 AA; 

Query Match 25,21; Score 292; DB 29; Length 1218; 

Best Local Similarity 40.04; Pred. No., 1.92e-14; 

Matches 38; Conservative 18; Mismatches 36; Indels 3; Gaps 2; 

Db 472 rcicppgyagdhcerdidecasnpclngghcqneinrfqclcptgfsgnlcqld-i--dy 528 

II I M:||:| : 1:1 : I l|::| :|:| : III |:|| II:: 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 



529 cepnpcqngaqcynrasdyfckcpedyegkncshl 563 

II lllll I :::l II : I :| I 
68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKL 102 



RESULT 15 

ID W05835 standard; Protein; 1193 AA. 

AC W05835; 

DT 28-JAN-1997 (first entry) 

DE Chick Serrate. 

KW C-Serrate; Notch; cell differentiation; cell fate; tissue repair; 

KW central nervous system; cancer; therapy; diagnosis. 

OS Gallus sp, 

FH Key Location/Qualifiers 

FT domain 1 . .1041 

FT /label- Extracellular.domain 

FT peptide 1..5 

FT /label- Sig_peptide 

FT /note- "lacks the N-terminal portion owing to 
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FT 




truncation of the encoding cDNA clone" 


FT 


doinain 


158.. 203 






/lahal = r\CT 


FT 




/note* "region of homology with Drosophila Delta 


FT 




and Serrate; predicted to mediate binding 


FT 




with Mntrh" 


FT 


domain 


208. .837 


FT 




/label" ELR 


FT 




/ilULc~ cpiucllllal yiUWLIl luLLUI icpcal UUlllalll 


FT 


region 


ilUO , . L JO 


FT 




/lahel= PTB1 


FT 


region 


239. ,274 


FT 




/label- ELR2 


FT 


region 


275. .313 


FT 




/lahpl = PT.B'? 


FT 


region 


314, .351 


FT 




/lahpl. PTE4 


FT 


region 


352, .390 


FT 




/label - ELR5 


FT 


region 


391, ,427 


FT 

FT 




/label" ELR6 


■ 


region 


428, .464 
/label= ELR7 


w 


region 


465. ,502 


FT 






FT 


region 


503. .540 


FT 




/lahp1= PT.PQ 




region 


■541 RflR 

Jll . . QUO 


FT 




/label- ELR10 


FT 


region 


607,, 644 


FT 




/label 0 ELRll 


FT 


region 


655. ,682 


FT 




/label" ELR12 


FT 


region 


683. .721 






/label" ELR13 


FT 


region 


722. .759 


FT 






FT 


region 


760.. 797 


FT 




/label" ELR15 


FT 


region 


798. .837 


FT 




/label" ELR16 


FT 


region 


854.. 911 


FT 




/label" Cysteine -rich_region 


FT 


domain 


1042.. 1066 


FT 




/label" Transmembrane.domain 


FT 


domain 


1067.. 1193 


FT 




/label- lntracellular_domain 



PN WO9627610-A1. 

PD 12-SEP-1996. 

•07-MAR-1996; 003172. 
07-MAR-1995; CJS-400159. 
(IMCR ) IMPERIAL CANCER RES TECHNOLOGY, 

PA (UYYA ) DNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 

PI Lewis JH, Mann RS, Myat AM; 

DR WPI; 96-425379/42. 

DR N-PSDB; T40092. 

PT Vertebrate Serrate protein and related DNA • used to treat or 

PT prevent malignancies characterised by increased Notch activity, 

PS Disclosure; Page 112-115; 161pp; English. 

CC Chicken Serrate (W05835), or C-Serrate, is a ligand for the zygotic 

CC neurogenic locus Notch and is believed to play a major role in 

CC determining cell fates in the central nervous system. Its amino 

CC acid sequence was deduced from a cDNA clone (T40092) obtd. from an 

CC optic explant cDNA library. C-Serrate- is expressed in the central 

CC nervous system, cranial placodes, nephric mesoderm, vascular 

CC system, and limb bud mesenchyme. 

SQ Sequence 1193 AA; 

Query Match 25.0%; Score 290; DB 19; Length 1193; 

Best Local Similarity 38.9*; Pred. No. 2.79e-14; 

Matches 37; Conservative 18; Mismatches 37; Indels 3; Gaps 2; 



Db 446 rcLCspgyagdhcekdinecasnpcmngghcqdeingfqclcpagfsgnlcqld-i--dy 502 

II I 11:11:1 : ::| : l.ll::| l|:|:: III |:|| ||:: 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 503 cepnpcqngaqcfnlamdyfcncpedyegkncshl 537 

II lllll I : : I I : I :| I 
Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKL 102 



Search completed; Fri May 28 09:19:22 1999 
Job time : 61 sees. 
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Release 3.1A John F, Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K, 
Distribution rights by Oxford Molecular Ltd 

lrch_pp protein - protein database search, using Smith -Waterman algorithm 



09:19:40 1999; MasPar time 9.14 Seconds 

675.417 Million cell updates/sec 



Run on: Fri May 2 

Tabular output not generated. 



Title: XJS-09-191-647-10 

Description: (1-154) from US09191647 .pep 

Perfect Score: 1160 

Sequence: 1 DPLPVHHRCECMLGYTGDNC EDNGILLYNGDNDHIAVELY 154 

Scoring table: PAM 150 
Gap 11 



122810 £ 



, 40066593 residues 



Post-processing: Minimum Match 0» 

Listing first 45 summaries 



Database; 
Statistics : 



pir60 

l:pirl 2:pir2 3:pir3 4:pir4 
Mean 39.965; Variance 75,1 



scale 0.532 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



iesult 
No. 



Query 



SUMMARIES 



Score 


Match Length 


DB ID 


Description 


Pred. No. 


462 


39.8 


1469 


2 B36665 


slit protein 2 precur 


1.19e 


72 


462 


39.8 


1480 


2 A36665 


slit protein 1 precur 


1.19e 


72 


448 


38.6 


530 


2 A31640 


epidermal growth fact 


l.lOe 


69 


325 


28.0 


728 


2 150719 


C-Delta-1 ■ chicken 


4.53e 


44 


324 


27.9 


1064 


2 A40136 


fibropellin la ■ sea 


7.25e 


44 


321 


27.7 


722 


2 148324 


DELTA- like 1 - mouse 


2.97e 


43 


310 


26.7 


570 


2 A48836 


fibropellin C precurs 


5.12e 


41 


301 


25.9 


2703 


2 A24420 


notch protein - , fruit 


3.40e 


39 


299 


25.8 


1203 


2 A49175 


Motch B protein - mou 


8.62e 


39 


299 


25.8 


2437 


2 S42612 


transmembrane protein 


8.62e 


39 


299 


25.8 


2471 


2 A49128 


cell -fate determining 


8.62e 


39 


292 


25.2 


1220 


2 A56136 


jagged protein precur 


2.22e 


37 


288 


24,8 


2318 


2 S45306 


notch 3 protein - mou 


1.41e 


36 


282 


24.3 


293 


2 B26637 


neurogenic repetitive 


2.25e 


35 


282 


24.3 


2139 


2 A35672 


crumbs protein - frui 


2,25e 


35 


282 


24.3 


2524 


2 A35844 


Xotch protein - Afric 


2.25e 


35 


282 


24.3 


2531 


2 S18188 


notch protein homolog 


2.25e 


35 


280 


24.1 


2321 


2 S78549 


notch3 protein - huma 


5,66e 


35 


279 


24.1 


2531 


2 A46019 


gene Notch- 1 protein 


8,96e 


35 


278 


24.0 


2555 


2 A40043 


notch protein homolog 


1.42e 


34 


265 


22.8 


1404 


2 A36666 


serrate protein precu 


5.47e 


32 


265 


22.8 


1408 


2 S16148 


gene serrate protein 


5.47e 


32 


, 263 


22,7 


861 


2 A48825 


Notch homolog Motch p 


1.36e 


31 



24 


249 


21.5 


1429 


2 


S06434 


homeotic protein lin- 


7.80e 


29 


25 


242 


20.9 


833 


2 


S19087 


gerie Delta protein pr 


1.82e 


27 


26 


240 


20.7 


387 


2 


B49175 


Motch A protein - mou 


4.47e 


27 


27 


240 


20.7 


832 


2 


A31246 


neurogenic protein De 


4.47e 


27 


28 


240 


20.7 


880 


2 


S00670 


gene Delta protein pr 


4.47e 


27 


29 


237 


20.4 


200 


2 


A26637 


neurogenic repetitive 


1.71e 


26 


30 


237 


20.4 


385 


2 


S53718 


homeotic protein dlk 


1.71e 


26 


31 


237 


20.4 


385 


2 


A54785 


preadipocyte factor 1 


1.71e 


26 


32 


228 


19,7 


1295 


2 


A32901 


glpl protein precurso 


9.47e 


25 


33 


227 


19.6 


1257 


2 


S28764 


neurocan - rat 


1.48e 


24 


34 


225 


19.4 


260 


2 


A44549 


fetal antigen 1 homeo 


3.58e 


24 


35 


225 


19.4 


1268 


2 


S52781 


neurocan - mouse 


3.58e 


24 


36 


223 


19.2 


383 


2 


B45484 


delta-like dlk homeot 


8.68e 


24 


37 


223 


19.2 


383 


2 


S53716 


homeotic protein dlk 


8.68e 


24 


38 


223 


19.2 


3562 


2 


A47171 


chondroitin sulfate p 


8.68e 


24 


39 


222 


19.1 


259 


2 


S48713 


fetal antigen 1 - hum 


1.35e 


23 


40 


221 


19.1 


4391 


2 


A38096 


perlecan precursor - 


2.10e 


23 


41 


219 


18.9 


102 


2 


B55885 


chondroitin sulfate p 


5.07e 


23 


42 


219 


18.9 


2409 


2 


A60979 


versican precursor - 


5.07e 


23 


43 


218 


18.8 


473 


2 


A56175 


adhesive plaque prote 


7.88e 


23 


44 


217 


18.7 


5147 


1 


IJFFTM 


cadherin-related tumo 


1.22e 


22 


45 


215 


18.5 


2397 


2 


A55535 


versican precursor - 


2.94e 


22 



RESULT 
ENTRY 
TITLE 



1 



ORGANISM 
DATE 



ACCESSIONS 



fauthors 



Ijournal 
♦title 



B36665 Itype complete 
slit protein 2 precursor - fruit fly (Drosophila 

melanogaster) 
tformaljiame Drosophila melanogaster 
30-Apr-1991 tsequence.revision 30-Apr-1991 itext.change 

16-Dec-1998 
B36665 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains. 
Across -references MUID: 91099665 
iaccession B36665 
ttstatus preliminary 
tftmolecule_type mRNA 
tftresidues 1-1469 ttlabel ROT 
ttcross-references GB:X53959 
GENETICS 

tgene FlyBase:sli 

ftcross-references FlyBase:FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 

tdomain proteoglycan amino-terminal homology ilabel 
PAH1\ 

Idomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR1\ 
tdomain leucine-rich alpha - 2 - glycoprotein repeat 

homology ilabel LRR2\ 
tdomain leucine-rich alpha - 2 - glycoprote in repeat 

homology tlabel LRR3\ 
tdomain leucine-rich alpha - 2 - glycoprotein repeat 

homology tlabel LRR4\ 
tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS1\ 

tdomain proteoglycan amino-terminal homology tlabel 
PAH2\ 

tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology f label LRR6\ 
tdomain leucine-rich alpha - 2 - glycoprotein repeat 



FEATURE 
66-91 



101-124 



173-196 
197-220 



347-370 
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homology tlabel LRR7\ 


371 


394 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR8\ 


395 


418 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
homology (tlabel LRR9\ 


419 


442 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
homology i label LR10\ 


450 


494 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS2\ 


512 


537 


tdomain proteoglycan amino-terminal homology tlabel 
PAH3\ 


547 


571 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LR11\ 


572 


595 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LR12\ 


596 


619 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR13\ 


620 


643 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR14\ 


651 


695 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS3\ 


(708 


733 


tdomain proteoglycan amino-terminal homology tlabel 

PAH4\ 


743 


766 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR15\ 


767 


790 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR16\ 


846 


890 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



tlength 1469 tmolecular -weight 164695 tchecksum 8361 



Query Match 39.8%; 
Best Local Similarity 42.7%; 
Matches 67; Conservative 



Score 462; DB 2; Length 1469; 

Pred. No. l.l9e-72; 

38; Mismatches 44; Indels 8; 



4; 



Db 1047 HYSCD£QAGFHGTNCTDNIDDCQNHMCQNGGTCTOGINDYQCRCPDDYTGKYCEGHNMIS 1106 

I 1:1 I: I lh:| II! :|:||||: III :| I I I : 1:1 II |: 
Qy 6 HHRCECMIjGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCE — IP 61 

Db 1107 MMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELE 1166 

|::| I: II :l III :|:| II: I I! I |::|| :::::: 
Qy 62 PA • PRSS - CEGTECQNGA - -NCVDQGSRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFT 117 

Db 1167 PLRTRPEANVTIVFSSAEQNGILMYDGQDAHLAVELF 1203 

I: :! Ihh 1 : 1 1 : 1 1 1 1 : 1 : 1 : : 1 : 1 1 1 1 : 
Qy 118 DLQNWPRANITLQVSTAEDNGILLYNGDNDHIAVELY 154 



2 



BRY A36665 ttype complete 

mLE slit protein 1 precursor - fruit fly (Drosophila 

melanogaster) 
tformal_name Drosophila melanogaster 
30-Apr-1991 tsequence_revision 30-Apr-1991 ttext.change 

24-Sep-1998 
A36665; S13523 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways' contains both 
EGF and LRR domains, 
tcross -references MUID: 91099665 
ftaccession A36665 

it status preliminary 
ttmolecule_type mRNA 
ttresidues . 1-1480 fttlabel ROT 
ttcross-references GB:X53959; NID;g8614; PID:g8615 
GENETICS 

tgene FlyBaseisli 

ttcross-references FlyBase:FBgn0003425 



KILT 
JRY 
TILE 

ORGANISM 
DATE 

ACCESSIONS 
REFERENCE 
tauthors 

tjournal 
•title 



KEYWORDS 
FEATURE 



tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 

alternative splicing 



66-91 


tdomain proteoglycan amino-terminal homology tlabel 






PAHl\ 


101 


124 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR1\ 


125 


148 


tdomain leucine-rich alpha - 2 -glycoprotein repeat 
homology tlabel LRR2\ 


149 


172 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 
homology tlabel LRR3\ 






tdomain leucine-rich alpha- 2 -glycoprotein repeat 
homology tlabel LRR4\ 


197 


220 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR5\ 


228 


272 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS1\ 


288 


313 


tdomain proteoglycan amino-terminal homology tlabel 
PAH2\ 


323 


346 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR6\ 






tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR7\ 


371 


394 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR8\ 


395 


418 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LRR9\ 


419 


442 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 
homology tlabel LR10\ 


450 


494 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS2\ 


512 




tdomain proteoglycan amino-terminal homology tlabel 
PAH3\ 






tdomain leucine-rich alpha- 2 -glycoprotein repeat 
homology tlabel LR11\ 


572 


595 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 
homology tlabel LR12\ 


596 


619 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 
homology tlabel LR13\ 


620 


643 


tdomain leucine-rich alpha-2-glycoprotein repeat 

hrtmrtlrtnv Jtlahal TD1A 

jiuiuuiuyy »iojjci Ltui\ 


651 


695 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS3\ 


708 


733 


tdomain proteoglycan amino-terminal homology tlabel 
PAH4\ 


743 


766 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 
homology tlabel LR15\ 


767 


790 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR16\ 


791 


814 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR17\ 


815 


838 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR18\ 


846 


890 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



tlength 1480 tmolecular -weight 165751 tchecksum 900 



Query Match 



39.8%; 



Best Local Similarity 42.7%; 
67; Conservative 



Matches 


Db 


1047 


Qy 


6 


Db 


1107 


Qy 


62 


Db 


1167 



Score 462; DB 2; Length 1480; 

Pred, No. 1.19e-72; 

38; Mismatches 44; Indels £ 



I hi I: I II::! III :|:||||: III :| I I I : |:| II 
HHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCE - - 



|::| I: II :| 



II! :h! II: I II I l::l! 



1167 PLRTRPEANVTIVFSSAEQNGILMYDGQDAHLAVELF 1203 



Tue Jun 1 10:15:55 1999 



US-09-191-647-10.rpr 



Page 3 



I: :l 11:1: I : I h 1 1 i I : I : I : : : I h 
Oy 118 DLQNWPRANITLQVSTAEDNGILLYNGDNDHIAVELY 154 



RESULT 3 

ENTRY A31640 #type fragment 

TITLE epidermal growth factor-like protein slit - fruit fly 

(Drosophila melanogaster) (fragment) 
ORGANISM tformaljiame Drosophila melanogaster 

DATE 28-Feb-1990 tsequencejrevision 28-Feb-1990 ftext change 

14-Aug-1998 
ACCESSIONS A31640 
REFERENCE A3 1640 

iauthors Rothberg, j.m.; Hartley, D.A.; Walther, z.; 

Artavanis-Tsakonas, S. 
♦journal Cell (1988) 55:1047-1059 

• ttitle slit: An EGF-homologous locus of D. melanogaster involved i 
the development of the embryonic central nervous system, 
tcross -references MUID: 89077533 
♦accession A31640 
ttmoleculejype DNA 
ttresidues 1-530 MlabeL ROT 
ttcross-references GB:M23543; NID:g340939; PID:g514357 



tgene FlyBase:sli 

f tcross -references FlyBase : FBgn00034 25 
470/3 

tsuperfamily EGF homology 
growth factor 



tintrons 
CLASSIFICATION 
KEYWORDS 
FEATURE 

148-181 
SUMMARY 



fdomain EGF homology tlabel EGF 
♦length 530 tchecksum 6330 



Query Match 38.6%; Score 448; DB 2; Length 530; 

Best Local Similarity 41.4*; Pred. No. 1.10e-69; 

Matches 65; Conservative 39; Mismatches 44; Indels 9; Gaps 5; 

Db 167 HYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMIS 226 

I M ' I II::! III :|:|lll: III :| II I : |:| I |: 
Oy 6 HHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCE IP 61 

Db 227 MMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELE 286 

l-l \- II :l III II: I II I |::|| :::::: 

Qy 62 PA-PRSS-CEGTECQNGA--NCVDQGSRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFT 117 

A 287 PLRTRPEANVTI-VFSSGQNGILMYDGQDAHLAVELF 322 
W : : : : I :llll:|:|:: |:||||: 

Qy 118 DLQNWPRANITLQVSTAEDNGILLYNGDNDHIAVELY 154 



RESULT 4 

ENTRY 150719 itype complete 

TITLE C-Delta-1 - chicken 

ORGANISM tformaljiame Gallus gallus #conunon_nanie chicken 

DATE 13-Sep-1996 tsequencejrevision 13-Sep-1996 ttext change 

14-Aug-1998 
ACCESSIONS 150719 
REFERENCE 150719 

tauthors Henrique, D,; Adam, J,; Myat, A.; Chitnis, A.; Lewis, J.; 
Ish-Horowicz/ D. 

tjournal Nature (1995) 375:787-790 

ttitle Expression of a Delta homologue in prospective neurons in the 
chick. 

tcross -references MUID: 95319507 
taccession 150719 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
ttmolecule.type mRNA 
ttresidues 1-728 ttlabel hen 
t tcross -references EMBL:U26590; NID:g882411; PID:g882412 
CLASSIFICATION #superfamily EGF homology 
FEATURE 

454-485 tdomain EGF homology flabel EGF 

SUMMARY tlength 728 f molecular -weight 79861 tchecksum 1765 



Query Match 28.0%; Score 325; DB 2; Length 728; 

Best Local Similarity 44,8%; Pred. No. 4.53e-44; 

Matches 43; Conservative 18; Mismatches 32; indels 3; Gaps 2; 

Db 436 CQCQAGFTGRHCDDNVDDCASFPCVNGGTCQDGVNDYSCTCPPGYNGKNCSTP-VSR-C 492 

M hll :l :l III I l|: I I II |:| I l|:| I I :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 493 EHNPCHNGATCHERSNRYVCECARGYGGLNCQFLLP 528 

I hill I Ihl hll :|: ||: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 5 

ENTRY A40136 ftype complete 

TITLE fibropellin la - sea urchin (Strongylocentrotus purpuratus) 

ALTERNATEJAMES epidermal growth factor homolog precursor . 
CONTAINS alternatively spliced fibropellin lb (EGFI) 
ORGANISM tformaljiame Strongylocentrotus purpuratus fcommonjiame 

purple urchin 

DATE 13-Mayl992 tsequencejrevision 17-Sep-1997 ttext change 

07-Aug-1998 

ACCESSIONS A40136; B40136; C40136; A29316; A43131 
REFERENCE A40136 

tauthors Delgadillo-Reynoso, M.G,; Rollo, D.R.; Hursh, D.A.; Raff, 
R.A. 

tjournal J. Mol, Evol. (1989) 29:314-327 

ttitle Structural analysis of the uEGF gene in the sea urchin 

Strongylocentrotus purpuratus reveals more similarity to 
vertebrate than to invertebrate genes with EGF-like 



♦cross-references MUID:90112459 
taccession A40136 

ttstatus preliminary 

ttmolecule.type mRNA 

ttresidues 1-114 ttlabel DEL 

ttcross-references GB:X17530; NlD:gl0225; PID:g667061 
taccession B40136 

ttstatus preliminary; not compared with conceptual translation 

ttmolecule.type DNA 
■ ttresidues 181-251, 329-370, 'R\ 372-408, 'RA\ 411-441 ttlabel DE2 
taccession C40136 

ttstatus preliminary; not compared with conceptual translation 

ttmolecule.type DNA 

ttresidues T, 747-821, 898-978 ttlabel DE3 
REFERENCE A29316 

tauthors Hursh, D.A.; Andrews, M.E.; Raff, R.A. 

tjournal Science (1987) 237:1487-1490 

ttitle A sea urchin gene encodes a polypeptide homologous to 

epidermal growth factor, 
tcross-references MUID: 87319677 
taccession A29316 

ttstatus preliminary 

ttmolecule.type mRNA 

ttresidues 'S', 280-481, 786-1064 ttlabel HUR 
ttcross-references GB:MX7421; NID:gl61474; PID:g552260 
iENCE A43131 
tauthors Hunt, L.T.; Barker, W.C. 
tjournal FASEB J, (1989) 3:1760-1764 

ttitle Avidin-like domain in an epidermal growth factor homolog from 

a sea urchin, 
tcross-references MUID: 89196806 
tcontents annotation 
COMMENT EGF homology repeats 10-17 are spliced out in the short form 
(fibropellin lb). 

CLASSIFICATION tsuperfamily Clr/Cls repeat homology; EGF homology 
FEATURE 

H9 tdomain signal sequence tstatus predicted tlabel SIG\ 

20-1064 tproduct fibropellin I tstatus predicted tlabel FIB\ 

23-54 tdomain EGF homology tlabel EG01\ 

57-175 tdomain Clr/Cls repeat homology tlabel CSR\ 

180-211 fdomain EGF homology tlabel EG02\ 
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218-249 
256-287 
294-325 
332-363 
370-401 
408-439 
446-477 
484-515 
522-553 
560-591 
598-629 
636-667 
674-705 
712-743 
750-781 
788-819 
826-857 
864-895 
902-933 
936-1064 
. 23-34,28-43,45-54, 
162-88,180-191, 
1185-200, 202-211, 
218-229,223-238, 
240-249,256-267, 
261-276,278-287, 
294-305,299-314, 
316-325,332-343, 
337-352,354-363, 
370-381,375-390, 
392-401,408-419, 
413-428,430-439, 
446-457,451-466, 
468-477,484-495 
489-504,506-515, 
522-533,527-542, 
544-553,560-571, 
565-580,582-591, 
598-609,603-618, 
620-629,636-647, 
641-656,658-667, 
674-685,679-694, 
696-705,712-723, 
717-732,734-743, 
750-761,755-770, 
772-781,788-799, 
793-808,810-819, 
826-837,831-846, 
848-857,864-875, 
1869-884,886-895, 
1902-913,907-922, 
r 924-933 



#domain 
ftdomain 
((domain 
((domain 
((domain 
((domain 
((domain 
((domain 
((domain 
ttdomain 
((domain 
((domain 
((domain 
((domain 
tdomain 
((domain 
tdomain 
((domain 
((domain 
((region 



EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
EGF homology 
avidin-like\ 



ilabel EG03\ 

tlabel EG04\ 

tlabel EG05\ 

Ilabel EG06\ 

tlabel EG07\ 

♦label EG08\ 

tlabel EG09\ 

tlabel EG10\ 

tlabel EG11\ 

tlabel EG12\ 

tlabel EG13\ 

tlabel EG14\ 

tlabel EG15\ 

tlabel EG16\ 

tlabel EG17\ 

tlabel EG18\ 

tlabel EG19\ 

tlabel EG20\ 

tlabel EG21\ 



tdisulfidejonds tstatus predicted\ 



SUMMARY 



tdisulfidejonds tstatus predicted 



((length 1064 tmolecular-weight 112072 ((checksum 303 



Query Match 27.9%; Score 324; DB 2; Length 1064; 

Best Local Similarity 46,7*; Pred. No. 7,25e-44; 

Matches 43; Conservative 16; Mismatches 30; 'Indels 3; Gaps 2; 

Db 808 CACVPGFTGSNCETNIDECASDPCLNGGICVDGVNGFVCQCPPNYSGTYCEIS-L--DAC 864 

I h hi II I hi I II: III II:: I I III III: :| 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 865 RSMPCQNGATCVNVGADYVCECVPGYAGQNCE 896 

: lllll II: I: ll:|:||::| :|| 
Qy 69 EGTECQNGANCVDQGSRPVCOCLPGFGGPECE 100 



ENTRY 148324 #type complete 

TITLE DELTA-like 1 • mouse 

ORGANISM tformal.name Mus musculus t common jiame house mouse 

DATE 02-Jul-1996 tsequence revision 02-Jul-1996 ttext change 

28-Feb-1997 



; Simon, D. ; Guenet, J.L.; 



ACCESSIONS 148324 
REFERENCE 148324 

tauthors Bettenhausen, B.; de Angelis, M.H 
Gossler, A. 

tjournal Development (1995) 121:2407-2418 

ttitle Transient and restricted expression during mouse 

embryogenesis of Dill, a murine gene closely related to 
Drosophila Delta. 
Kcross -references MUID: 95401858 
taccession 148324 

ttstatus preliminary; translated from GB/EMBL/DDBJ 
tftmolecule_type mRNA 
tltresidues 1-722 ttlabel RES 
ttcross-references EMBL:X80903; NID:g806569; PID:g806570 
GENETICS 

tgene Dill 
SUMMARY tlength 722 tmolecular -weight 78448 tchecksum 1452 

Query Match 27,71; Score 321; DB 2; Length 722; 

Best Local Similarity 43,84; Pred. No. 2.97e-43; 

Matches 42; Conservative 19; Mismatches 32; Indels 3; Gaps 2; 

Db 428 CRCQAGFSGRYCEDNVDDCASSPCANGGTCRDSVNDFSCTCPPGYTGKNCS-APVSR--C 484 

I I l::l I :l III I II: I I II ::| I M I :| :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 485 EHAPCHNGATCHQRGQRYMCECAQGYGGPNCQFLLP 520 

I : 1:111 I ::| I :|:| 1 : 1 1 1 : 1 : lh 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



tauthors 
tjournal 
ttitle 



RESULT 7 

ENTRY A48836 I type complete 

TITLE fibropellin C precursor - sea urchin (Strongylocentrotus 

purpuratus) 

ALTERNATEJAMES' EGF repeat- containing protein; epidermal growth 
factor -related protein 3; fibropellin III 

ORGANISM tformaljiame Strongylocentrotus purpuratus tcommon_name 

purple urchin 

DATE 01-Dec-1993 tsequence revision 18-Nov-1994 ttext change 

07-Aug-1998 
A48836 
A48836 

Bisgrove, B.W.; Raff, R.A. 
Dev. Biol. (1993) 157:526-538 

The SpEGF III gene encodes a member of the fibropellins: EGF 
repeat-containing proteins that form the apical lamina of 
the sea urchin embryo, 
tcross -references MUID; 93273088 
taccession A48836 

ttstatus preliminary 
ttmolecule_type mRNA 
ttresidues 1-570 ttlabel BIS 
ttcross -references GB:L07045; NID:g310659; PID;g310650 
ttnote sequence extracted from NCBI backbone (NCBIN: 132724, 

NCBIP:132725) 
(tsuperfamily Clr/Cls repeat homology; EGF homology 

tdomain signal sequence tstatus predicted tlabel SIG\ 
tproduct fibropellin c tstatus predicted tlabel fib\ 
tdomain EGF homology tlabel EGF1\ 
tdomain Clr/Cls repeat homology tlabel C1R\ 
tdomain EGF homology tlabel EGF2\ 
tdomain EGF homology tlabel EGF3\ 
tdomain EGF homology tlabel EGF4\ 
tdomain EGF homology tlabel EGF5\ 
tdomain EGF homology tlabel EGF6\ 
tdomain EGF homology tlabel EGF7\ 
tdomain EGF homology tlabel EGF8\ 
tregion avidin-like\ 



CLASSIFICATION 
FEATURE 

1-18 

19-570 

19-54 

57-175 

176-211 

214-249 

252-287 

290-325 

328-363 

366-401 

404-439 

442-570 

23-34,28-43,45-54, 
62-88,180-191, 
185-200,202-211, 
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218-229,223-238, 

240-249,256-267, 

261-276,278-287, 

294-305,299-314, 

316-325,332-343, 

337-352,354-363, 

370-381,375-390, ■ 

392-401,408-419, 

413-428,430-439 fdisulfide.bonds ((status predicted 
SUMMARY tlength 570 Imolecular-weight 61115 tchecksum 5567 

Query Match 26.7*; Score 310; DB 2; Length 570; 

Best Local Similarity 42.4%; Pred. No. 5.12e-41; 

Matches 39; Conservative 19; Mismatches 31; Indels 3; Gaps 2; 

Db 314 CDCRAGFTGSNCETNINECASSPCLNGGSCLDGVDGYVCQCLPNYTGTHCEIS-L-DAC 370 

fhl hll II I ::l I II: hi h:| I I: hi III: :| 
9 CECMLGYTGDNCSENODDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 
371 ASLPCQNGGVCTNVGGDYVCECLPGYTGINCE 402 
: Mil: I : |: Ihlllh I :|| 
Qy 69 EGTECQNGANC VDQGS RPVCQC LPG FGG PECE 100 



RESULT 8 

ENTRY A24420 ttype complete 

TITLE notch protein - fruit fly (Drosophila melanogaster) 

ALTERNATEJAMES neurogenic repetitive locus protein 
ORGANISM lformal_name Drosophila melanogaster 
DATE 30-Jun-1987 tsequencejrevision 30-Jun-1987 ttext change 

07-Aug-1998 

ACCESSIONS A24420; A24768; S09358; A05267 
REFERENCE A24420 

tauthors Kidd, S.; Kelley, M.R.; Young, M.W. 
Ijournal Mol. Cell. Biol. (1986) 6:3094-3108 
f cross -references MOID: 87064624 
laccession A24420 
ttmolecule_type DNA 
Itresidues 1-2703 iilabel KID 
Itcross-references GB:K03508; NID:gl57991; PID:gl57993 
REFERENCE A24768 

♦authors Wharton, K.A.; Johansen, K.M.; Xu, T.; Artavanis-Tsakonas, S. 
ijournal Cell (1985) 43:567-581 
tcross-references MUID: 86079539 

•laccession A24768 
ttmolecule type mRNA 
Itresidues 1-48, 'I' ,50-118, 'R' ,120-230, 'I' ,232-256, W, 258-266, 'A', 
268-872, 'R' ,874-958, 'R' ,960-1970, 'FH' ,1973-2256, 'G', 
2258-2264, T, 2266-2406, 'R', 2408-2444, 'L',2446-2703 
ttlabel WHA1 

ttnote the authors translated the codon ATC for residue 49 as 

Thr, ATT for residue 2044 as Arg, GTA for residue 2265 
as Ala, CGC for residue 2407 as His, and CTT for 
residue 2445 as Arg 

REFERENCE S09358 
If authors Tautz, D. 

Ijournal Nucleic Acids Res. (1989) 17:6463-6471 

ititle Hypervariability of simple sequences as a general source for 

polymorphic DNA markers, 
across -references MUID: 89385974 
((accession S09358 
ttmolecule_type DNA 

(flresidues 2505-2551, 'QQQQ', 2552-2576, 'E', 2578-2604 ttlabel TAU 
REFERENCE A05267 

((authors Wharton, K.A.; Yedvobnick, B.; Finnerty, V.G.; 

Artavanis-Tsakonas, S. 
((journal Cell (1985) 40:55-62 

((title opa: a novel family of transcribed repeats shared by the 

Notch locus and other developmental^ regulated loci in D, 
melanogaster, 

tcross -references MUID: 85099329 

((accession A05267 
#(*molecule_type DNA 



Itresidues 2504-2576, 'E', 2578-2611 fttlabel WHA2 
GENETICS 

tgene notch; opa 

(ticross-references FlyBase: FBgn0004647 
fmap_position 8.96-9,36 

lintrons 53/3; 84/3; 171/3; 240/3; 283/3; 2333/3; 2436/3; 2588/3 
CLASSIFICATION fsuperfamily notch protein; ankyrin repeat homology; EGF 
homology 

KEYWORDS differentiation; tandem repeat; transmembrane protein 

FEATURE 

27-43 ((domain transmembrane ((status predicted tlabel TMM1\ 

568-599 tdomain EGF homology ilabel EGF\ 

1746-1762 ((domain transmembrane tstatus predicted tlabel TMM2\ 

1950-1982 ((domain ankyrin repeat homology tlabel AN1\ 

1983-2015 tdomain ankyrin repeat homology tlabel AN2\ 

1988-2004 tdomain transmembrane tstatus predicted tlabel TMM3\ 

2017-2049 (tdomain ankyrin repeat homology tlabel AN3\ 

2050-2082 tdomain ankyrin repeat homology tlabel AN4\ 

2083-2115 Idomain ankyrin repeat homology tlabel AN5\ 

2538-2568 tregion glutamine-rich\ 

2538-2568 tdomain neurogenic repetitive element tstatus predicted 

tlabel OPA 

SUMMARY tlength 2703 tmolecular -weight 288876 tchecksum 6404 

Query Match 25.9%; Score 301; DB 2; Length 2703; 

Best Local Similarity 40,7%; Pred. No, 3.40e-39; 

37; Conservative 19; Mismatches 32; indels 3; Gaps 2; 



308 CTCPLGFSGINCQTNDEDCTESSCLNGGSCIDGINGYNCSCLAGYSGANCQYKLN-K-C 1064 
I I ll::l II l"ll : I II: |:| :IM I h llll h : I 
9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 



1065 DSNPCLNGATCHEQNNEYTCHCPSGFTGKQC 1095 
I III I :| : hi HI I :| 
69 EGTECQNGANCVDQGSRPVCQCLPGFGGPEC 99 



RESULT 9 

ENTRY A49175 I type fragment 

TITLE Motch B protein - mouse (fragment) 

ALTERNATEJAMES Notch homolog 

ORGANISM fformal.name Mus musculus tcommon_name house mouse 

DATE 21-Jan-1994 tsequence_revision 05-Jan-1996 ttext change 

14-Aug-1998 

ACCESSIONS A49175; PH1570; S32113 

REFERENCE A49175 

tauthors Lardelli, M.; Lendahl, U. 

((journal Exp. Cell Res. (1993) 204:364-372 

Ititle Motch A and Motch B--two mouse Notch homologues coexpressed 

in a wide variety of tissues, 
. tcross -references MUID:93178563 
laccession A49175 

llstatus preliminary; nucleic acid sequence not shown 

llmolecule_type mRNA 

Itresidues 1-1203 ftlabel LAR 

Itcross-references EMBL:X68279; NID:g287989; PID:g287990 

ttexperimental„source embryo 

ttnote sequence extracted from NCBI backbone (NCBIP: 126158) 

COMMENT This protein has many EGF repeats and lin-12/Notch repeats. 
COMMENT This protein is one of the neurogenic proteins controlling the 

decision between ectodermaland neural fate for cells in the early 

embryo. 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

560-591 tdomain EGF homology Ilabel EGF 

SUMMARY tlength 1203 Ichecksum 910 

Query Match 25.8%; Score 299; DB 2; Length 1203; 

Best Local Similarity 38.7%; Pred. No. 8,62e-39; 

Matches 36; Conservative 20; Mismatches 34; Indels 3; Gaps 2; 

Db 124 HCECLKGYAGPRCEMDINECHSDPCQNDATCLDKIGGFTCLCMPGFKGVHCELE-V--NE 180 
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:|!h 11:1 I : ::| III | |:| : :::|||: |: | ||: : 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 181 CQSNPCVNNGQCVDKVNRFQCLCPPGFTGPVCQ 213 

I- II: III :| I I III II |: 
Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 10 

ENTRY S42612 itype complete 

TITLE transmembrane protein precursor - zebra fish 

ORGANISM iformaljame Brachydanio rerio iconunonjiame zebra fish ■ 

DATE 20-Feb-1995 isequence.revision 20-Feb-1995 itext change 

10-Jul-1998 
ACCESSIONS S42612 
REFERENCE S42612 

# authors Bierkamp, C; Campos -Ortega, j.a. 

♦journal Mech, Dev. (1993) 43:87-100 

ititle A zebrafish homologue of the Drosophila neurogenic gene Notch 
and its pattern of transcription during early 

•embryogenesis. 
cross-references MUID: 94128602 
accession S42612 ■ 
♦♦status preliminary 
♦imolecule.type mRNA 
♦♦residues 1-2437 Mlabel BIE 
ttcross -references EMBL:X69088; NID:g433866; PID:g433867 
CLASSIFICATION Isuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1915-1947 idomain ankyrin repeat homology ilabel AN1\ 

1948-1980 idomain ankyrin repeat homology tlabel AN2\ 

1982-2014 idomain ankyrin repeat homology Ilabel AN3\ 

2015-2047 idomain ankyrin repeat homology ilabel AN4\ 

2048-2080 idomain ankyrin repeat homology ilabel AN5 

SUMMARY ilength 2437 imolecular-weight 262306 ichecksum 4021 

Query Match 25,8*; Score 299; DB 2; Length 2437; 

Best Local Similarity 44.1%; Pred. No. 8.62e-39; 

Matches 41; Conservative 14; Mismatches 34; Indels 4; Gaps 3; 

Db 474 HCICMPGYEGVFCQINSDDCASQPCLNG-KCIDKINSFHCECPKGFSGSLCQVD-V--DE 529 

H II II I I I III : I II |:| | | |:|| ||:: 
Oy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 530 CASTPCKNGAKCTDGPNKYTCECTPGFSGIHCE 562 

I :l I 'I; I :: |:| |||:| 
Qy 68 CEGTECQNG ANCVDQGSRPVCQC LPG FGG PECE 100 

tiILT 11 
RY A49128 itype complete 

TITLE cell-fate determining gene Notch2 protein - rat 

ORGANISM iformaljiame Rattus norvegicus iconunonjiame Norway rat 

DATE 21-Jan-1994 isequence.revision 18-Nov-1994 Itext change . 

14-Aug-1998 
ACCESSIONS A49128 
REFERENCE A49128 

iauthors Weinmaster, G,; Roberts, V.J.; Lemke, G. 
tjournal Development (1992) 116:931-941 
ititle Notch2: a second mammalian Notch gene, 
icross-references MUID:93202015 
iaccession A49128 

iistatus preliminary; not compared with conceptual translation 

♦♦molecule J:ype mRNA 

iiresidues 1-2471 iilabel WEI 

itexperimental_source Schwann cell 

itnote sequence extracted from NCBI backbone (NCBIP:127811) 

CLASSIFICATION isuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

1029-1060 idomain EGF homology ilabel EGF\ 

1876-1908 idomain ankyrin repeat homology ilabel ANl\ 



1909-1941 Idomain ankyrin repeat homology ilabel AN2\ 

1943-1975 idomain ankyrin repeat homology ilabel AN3\ 

1976-2008 idomain ankyrin repeat homology ilabel AN4\ 

2009-2041 idomain ankyrin repeat homology ilabel AN5 

SUMMARY ilength 2471 tmolecular -weight 265367 ichecksum 5929 

Query Match 25.8%; Score 299; DB 2; Length 2471; 

Best Local Similarity 38.7%; Pred. No. 8,62e-39; 

Matches 36; Conservative 20; Mismatches 34; Indels 3; Gaps 2; 

Db 441 HCECLKGYAGPRCEMDINECHSDPCQNDATCLDKIGGFTCLCMPGFKGVHCELE-V--NE 497 

:IN: I: I : ::| ||| | |:| : :::|||: : | ||: : 
Qy 8 RCECMLGYTGDNCSENQDDCRDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 498 CQSNPCVNNGQCVDKVNRFQCLCPPGFTGPVCQ 530 

I:: II: III :| I I III II |: 
Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 12 

ENTRY A56136 itype complete 

TITLE jagged protein precursor - rat 

ORGANISM ♦ formal jiame Rattus norvegicus iconunonjiame Norway rat 

DATE 28-Apr-1995 isequence revision 28-Apr-1995 itext change 

ll-Aug-1995 
ACCESSIONS A56136 
REFERENCE A56136 

iauthors Lindsell, C.E.; Shawber, C.J.; Boulter, j. ; Weinmaster, G. 

tjournal Cell (1995) 80:909-917 

ititle Jagged: a mammalian ligand that activates Notchl. 

icross-references MUID: 95211842 

iaccession A56136 

iistatus preliminary 
ttmolecule_type mRNA 
iiresidues 1-1220 itlabel LIN 
iicross-references GB:L38483 
SUMMARY Ilength 1220 imolecular -weight 134528 ichecksum 2746 

Query Match 25.2*; Score 292; DB 2; Length 1220; 

Best Local Similarity 40.0*; Pred. No. 2.22e-37; 

Matches 38; Conservative 18; Mismatches 36; Indels 3; Gaps 2; 

Db 473 RCICPPGYAGDHCERDIDECASNPCLNGGHCQNEINRFQCLCPTGFSGNLCQLD-I--DY 529 

II I 11:11:1 : hi : I ll::| :hl : III |:|| ||:: 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 530 CEPNPCQNGAQCYNRASDYFCKCPEDYEGKNCSHL 564 

II lllll I :::| II : I :| I 
Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKL 102 



RESULT 13 

ENTRY S45306 itype complete 

TITLE notch 3 protein - mouse 

ORGANISM iformaljiame Mus musculus tcommonjiame house mouse 

DATE 20-Feb-1995 isequence revision 20-Feb-1995 itext change 

lO-Jul-1998 
ACCESSIONS S45306 
REFERENCE S45306 
.♦authors Lardelli, M, ; Dahlstrand, J.; Lendahl, u. 
♦journal Mech. Dev. (1994) 46:123-136 
♦title The novel Notch homologue mouse Notch 3 lacks specific 
epidermal growth factor-repeats and is expressed in 
proliferating neuroepithelium. 
icross-references MUID: 95001556 
♦accession S45306 

iistatus preliminary 
limoleculejype mRNA 
♦♦residues 1-2318 iilabel LAR 
iicross-references EMBL:X74760; NID:g483580; PID:g483581 
CLASSIFICATION isuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 
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1839-1871 tdomain ankyrin repeat homology tlabel Ml\ 

1872-1904 tdomain ankyrin repeat homology tlabel AN2\ 

1906-1938 tdomain ankyrin repeat homology tlabel AN3\ 

1939-1971 tdomain ankyrin repeat homology ilabel AN4\ 

1972-2004 (tdomain ankyrin repeat homology ilabel AN5 

SUMMARY flength 2318 tmolecular-weight 244245 fchecksum 9358 

Query Match 24.81; Score 288; DB 2; Length 2318; 

Best Local Similarity 41.9*; Pred. No. 1.41e-36; 

Matches 39; Conservative 16; Mismatches 35; Indels 3; Gaps 2; 

Db 456 CICMAGFTGTYCEVDIDECQSSPCVNGGVCKDRVNGFSCICPSGFSGSMCQLD-V--DEC 512 

I II 1:11 I : 1:1 I II: I I ll:::l I |:|| :|:: I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 513 ASTPCRNGAKCVDOPDGYECRCAEGFEGTLCER 545 
A :| 1:111:1111 |:| II I II: 

H 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEK 101 



RESULT 
ENTRY 
TITLE 



14 



ORGANISM 
DATE 



ACCESSIONS 



B26637 ttype fragment 

neurogenic repetitive locus 95F protein - fruit fly 

(Drosophila melanogaster) (fragment) 
.fformaljiame Drosophila melanogaster 
16-Aug-1988 tsequencejrevision 16-Aug-1988 ttext.change 

14-Aug-1998 
B26637 
A91081 

Knust, E.; Dietrich, U.; Tepass, U.; Bremer, K.A.; Weigel, 

D.; Vaessin, H.; Campos -Ortega, J. A. 
EMBO J. (1987) 6:761-766 

EGF homologous sequences encoded in the genome of Drosophila 
melanogaster, and their relation to neurogenic genes, 
f cross -references MUID : 87218537 
faccession B26637 
§#molecule_type mRNA 
ifresidues 1-293 ttlabel KNU 
fKcross-references GB:X05144; NID:g7519; PID:g929536 
GENETICS 

#gene FlyBase:crb 

tf cross-references FlyBase: FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 
transmembrane protein 



iauthors 



tjournal 
•title 



KEYWORDS 
EATURE 
1216-252 



tdomain EGF homology tlabel EGF 
tlength 293 fchecksum 3413 



Query Match 24.3%; Score 282; DB 2; Length 293; 

Best Local Similarity 39.6%; Pred. No. 2.25e-35; 

Matches 38; Conservative 19; Mismatches 34; Indels 5; Gaps 5; 

Db 159 CQCQPGFEGQHCEQNIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTCEN 218 

1:1 h l::l :| hi I: hll: II : II I I I I I |:: : 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEI-PP-A-PK 65 

Db 219 EPCRNGSTCQNGFN-ASTGNNFTCTCVPGFEGPLCD 253 

M :|: MM I h I hill II h 
Qy 66 SSC-EGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



organization of epithelia. 
tcross -references MUID: 90263104 
taccession A35672 

tfstatus preliminary 

tfmolecule.type mRNA 

ttresidues 1-2139 ttlabel TEP 

ttcross-references GB:M33753 

ttnote the authors translated the codon GGC for residue 1928 as 

Cys, and TAT for residue 2023 as Gin 

GENETICS 

tgene FlyBase :crb 

t tcross - references FlyBase : FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 
transmembrane protein 



FEATURE 

691-722 
SUMMARY 



tdomain EGF homology tlabel EGF 
tlength 2139 f molecular -weight 233619 tchecksum 7230 



Query Match 24.3%; Score 282; DB 2; Length 2139; 

Best Local Similarity 39.6%; Pred. No. 2.25e-35; 

Matches 38; Conservative 19; Mismatches 34; Indels 5; Gaps 5; 

Db 1821 CQCQPGFEGQHCEQNIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTCEN 1880 

hi I: MM :| hi h hlh I I : II I I.I I I |:: : 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEI-PP-A-PR 65 

Db 1881 EPCRNGSTCQNGFN-ASTGNNFTCTCVPGFEGPLCD 1915 

M :h MM I h I hill II |: 
Qy 66 SSC - EGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



Search completed: Fri May 28 09:20:00 1999 
Job time : 20 sees. 



RESULT 15 

ENTRY 

TITLE 

ORGANISM 

DATE 

ACCESSIONS 
REFERENCE 
tauthors 
tjournal 
ftitle 



A35672 ttype complete 

crumbs protein • fruit fly (Drosophila melanogaster) 

fformaljiame Drosophila melanogaster 

21-Sep-1990 tsequence_revision 18-Nov-1992 ttext change 

14-Aug-1998 
A35672 
A35672 

Tepass, U.; Theres, C; Knust, E. 
Cell (1990) 61:787-799 

crumbs encodes an EGF-like protein expressed on apical 
membranes of Drosophila epithelial cells and required for 
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^srch_pp protein - protein database search, using Smith-Waterman algorithm 
Run on: Fri May 2 

Tabular output not generated. 



): 20:17 1999; MasPar time 6.36 Seconds 

684.284 Million cell updates/sec 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



>US-09-191-647-10 

(1-154) from US09191647.pep 

1160 

1 DPLPVHHRCECMLGYTGDNC EDNGILLYNGDNDHIAVELY 154 



PAM 150 
Gap 11 



, 28268293 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 



Swiss -prot37 
l:swissprot 

Statistics: Mean 40.992; Variance 67.443; scale 0.608 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



SUMMARIES 



No. 


Score 


Match Length D 


3 ID 


Description 


Pred. No. 


1 


462 


39.8 


1480 


SLIT.DROME 


SLIT PROTEIN PRECURSOR 


7 


20e-83 


2 


324 


27.9 


1064 


FBP1.STRPU 


FIBROPELLIN I PRECURSO 


4 


14e-50 


3 


321 


27,7 


722 


DLLlJOUSE 


DELTA-LIKE PROTEIN 1 P 


2 


05e-49 


4 


321 


27,7 


723 


DLL1JUMAN 


DELTA-LIKE PROTEIN 1 P 


2 


05e-49 


5 


315 


27,2 


714 


DLLlJAT 


DELTA-LIKE PROTEIN 1 P 


5 


03e-48 


6 


310 


26.7 


570 


FBP3.STRPU 


FIBROPELLIN C PRECURSO 


7 


18e-47 


7 


301 


25,9 


2703 


NOTCJROME 


NEUROGENIC LOCUS NOTCH 


8 


47e-45 


8 


299 


25.8 


2437 


NOTCJRARE 


NEUROGENIC LOCUS NOTCH 


2 


446-44 


9 


288 


24.8 


2318 


NTC3.MOUSE 


NEUROGENIC LOCUS NOTCH 


8 


01e-42 


10 


282 


24.3 


2139 


CRB_DROME 


CRUMBS PROTEIN PRECURS 


1 


86e-40 


11 


282 


24.3 


2524 


NOTCJENLA 


NEUROGENIC LOCUS NOTCH 


1 


86e-40 


12 


282 


24.3 


2531 


NTC1JAT 


NEUROGENIC LOCUS NOTCH 


1 


86e-40 


13 


279 


24.1 


2531 


NTClJiOUSE 


NEUROGENIC LOCUS NOTCH 


8 


95e-40 


14 


278 


24.0 


2444 


NTC1JUMAN 


NEUROGENIC LOCUS NOTCH 


1 


51e-39 


15 


272 


23.4 


1964 


NTC4JOUSE 


NEUROGENIC LOCUS NOTCH 


3 


45e-38 


16 


265 


22.8 


1408 


SERR.DROME 


SERRATE PROTEIN PRECUR 


1 


31e-36 


17 


249 


21,5 


1429 


LI12.CAEEL 


LIN-12 PROTEIN PRECURS 


4 


99e-33 


18 


240 


20.7 


880 


DL.DROME 


NEUROGENIC LOCUS DELTA 


4 


94e-31 


19 


237 


20.4 


385 


DLKJ40USE 


DELTA-LIKE PROTEIN PRE 


2 


27e-30 


20 


228 


19.7 


1295 


GLP1.CAEEL 


GLP-1 PROTEIN PRECURSO 


2 


15e-28 


21 


227 


19,6 


1257 


PGCN.RAT 


NEUROCAN CORE PROTEIN 


3 


56e-28 


22 


225 


19.4 


1268 


PGCN.MOUSE 


NEUROCAN CORE PROTEIN 


9 


75e-28 


23 


223 


19.2 


383 


DLKJUMAN 


DELTA-LIKE PROTEIN PRE 


2 


66e-27 



24 


223 


19.2 


3562 


PGCV_CHICK 


VERSICAN CORE PROTEIN 


2.66e-27 


25 


221 


19.1 


4393 


PGBMJUMAN 


BASEMENT MEMBRANE -SPEC 


7,24e-27 


26 


219 


18.9 


3396 


PGCVJUMAN 


VERSICAN CORE PROTEIN 


1.97e-26 


27 


217 


18.7 


5147 


FAT.DROME 


CADHERIN-RELATED TUMOR 


5.34e-26 


28 


215 


18.5 


3358 


PGCVJOUSE 


VERSICAN CORE PROTEIN 


l,45e-25 


29 


212 


18.3 


862 


PGCVJ4ACNE 


VERSICAN CORE PROTEIN 


6.42e-25 


30 


205 


17,7 


4544 


LRPlJUMAN 


LOW-DENSITY .LIPOPROTEI 


2,04e-23 


31 


200 


17,2 


432 


NEL2.RAT 


NEL-LIKE PROTEIN (FRAG 


2,38e-22 


32 


199 


17,2 


3707 


PGBMJOUSE 


BASEMENT MEMBRANE "SPEC 


3.88e-22 


33 


192 


16.6 


428 


NEL2JUMAN 


NEL-LIKE PROTEIN (FRAG 


1.17e-20 


34 


193 


16.6 


1376 


NID2JUMAN 


NIDOGEN-2 PRECURSOR (N 


7.20e-21 


35 


184 


15,9 


1955 


AGRI.CHICK 


AGRIN PRECURSOR. 


5.53e-19 


36 


183 


15.8 


515 


APX1_CAEEL 


APX-1 PROTEIN PRECURSO 


8.92e-19 


37 


183 


15,8 


1959 


AGRI.RAT 


AGRIN PRECURSOR. 


8.92e-19 


38 


181 


15.6 


2109 


PGCA.CHICK 


AGGRECAN CORE PROTEIN 


2.32e-18 


39 


174 


15,0 


816 


NELJUMAN 


NEL PROTEIN PRECURSOR 


6.43e-17 


40 


174 


15.0 


4543 


LRPl.CHICK 


LOW-DENSITY LIPOPROTEI 


6.43e-17 


41 


173 


14.9 


816 


NEL.CHICK 


NEL PROTEIN PRECURSOR 


1.03e-16 


42 


172 


14.8 


564 


PGCA.CANFA 


AGGRECAN CORE PROTEIN 


l,65e-16 


43 


172 


14.8 


816 


NELJiOUSE 


NEL PROTEIN PRECURSOR 


1.65e-16 


44 


172 


14,8 


2871 


FBN1.MOUSE 


FIBRILLIN 1 PRECURSOR. 


1.65e-16 


45 


171 


14.7 


2871 


FBNlJUMAN 


FIBRILLIN 1 PRECURSOR. 


2.64e-16 



ID SLIT.DROME STANDARD; PRT; 1480 AA, 

AC P24014; 

DT 01-MAR-1992 (REL. 21, CREATED) 

DT 01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE SLIT PROTEIN PRECURSOR. 

GN SLI. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91099665. 

RA ROTHBERG J.M., JACOBS J.R., GOODMAN C,S,, ARTAVANIS-TSAKONAS S.; 

RT "Slit: an extracellular protein necessary for development of midline 

RT glia and commissural axon pathways contains both EGF and LRR 

RT domains,"; 

RL GENES DEV, 4:2169-2187(1990). 

CC -!- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

CC COMMISSURAL AXON PATHWAYS, SLIT MAY INTERACT WITH EXTRACELLULAR 

CC MATRIX MOLECULES. 

CC -!- TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 
CC EVENTUALLY DISTRIBUTED ALONG THE AXONS, 

CC -I- ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

CC BY 11 AA AT THE C'TERMINUS OF THE LAST EGF REPEAT , 

CC •!- SIMILARITY: CONTAINS 7 EGF- LIKE DOMAINS, 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN; 22. TWO BLOCK OF 6 LRR'S 

CC AND TWO BLOCKS OF 5 LRR'S. 

CC -!- SIMILARITY: CONTAINS A C -TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

CC 

CC This SWISS -PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; X53959; G8615; -. 

DR PIR; A36665; A36665, 

DR FLYBASE; FBgn0003425; sli. 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF_1; 7. 

DR PROSITE; PS01185; CTCK_1; 1, 
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DR 


PROSITE; PS01186; EGFJ; 5. 




FT 


DISULFID 


973 982 BY SIMILARITY. 


DR 


PROSITE; PS01187; EGF.CA; 2. 




FT 


DISULFID 


989 1001 BY SIMILARITY. 


DR 


PROSITE; PS01225; CTCKJ; 1. 




FT 


DISULFID 


995 1010 BY SIMILARITY. 


DR 


PFAM; PF00007; Cys knot; 1. 




FT 


DISULFID 


1012 1021 BY SIMILARITY. 


DR 


PFAM; PF00008; EGF; 7. 




FT 


DISULFID 


1028 1041 BY SIMILARITY, 


DR 


PFAM; PF00054; laminin.G; 1. 




FT 


DISULFID 


1035 1050 BY SIMILARITY. 


DR 


PFAM; PF00560; LRR; 10. 




FT 


DISULFID 


1052 1061 BY SIMILARITY, 


DR 


HSSP; P00740; 1IXA. 




FT 


DISULFID 


1068 1079 BY SIMILARITY. 


KW 


NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 


FT 


DISULFID 


1073 1088 BY SIMILARITY. 


KW 


EGF-LIKE DOMAIN; 


REPEAT; LEUCINE -REPEAT; DUPLICATION. 




DISULFID 


1090 1099 BY SIMILARITY. 


FT 


SIGNAL 


1 


36 




FT 


DISULFID 


1115 1125 BY SIMILARITY. 


FT 


CHAIN 


37 


1480 


SLIT PROTEIN. 




DISULFID 


1120 1137 BY SIMILARITY, 


FT 


DOMAIN 


70 


104 


CONSERVED N-FLANKING REGION OF THE LRR, 


FT 


DISULFID 


1139 1148 BY SIMILARITY, 


FT 


DOMAIN 


105 


230 


LEUCINE-RICH REPEATS (1ST REGION). 


FT 


DISULFID 


1357 1368 BY SIMILARITY. 


FT 


DOMAIN 


231 


2j)4 


CONSERVED C-FLANKING REGION OF THE LRR, 


FT 


DISULFID 


1362 1380 BY SIMILARITY. 


FT 


DOMAIN 


295 


326 


CONSERVED N-FLANKING REGION OF THE LRR, 


FT 


DISULFID 


1382 1391 BY SIMILARITY. 


FT 


DOMAIN 


327 


452 


LEUCINE-RICH REPEATS (2ND REGION). 


FT 


DISULFID 


1409 1443 BY SIMILARITY. 


FT 


DOMAIN 


453 


518 


CONSERVED C-FLANKING REGION OF THE LRR. 


FT 


DISULFID 


1423 1457 BY SIMILARITY. 


FT 


DOMAIN 


519 


550 


CONSERVED N-FLANKING REGION OF THE LRR. 


FT 


DISULFID 


1434 1473 BY SIMILARITY. 


FT 


DOMAIN 


551 


653 


LEUCINE-RICH REPEATS (3RD REGION) . 


FT 


DISULFID 


1438 1475 BY SIMILARITY. 


FT 


DOMAIN 


654 


714 


CONSERVED C-FLANKING REGION OF THE LRR. 


FT 


DISULFID 


1442. 1479 BY SIMILARITY. 




DOMAIN 


715 


746 


CONSERVED N-FLANKING REGION OF THE LRR. 


SQ 


SEQUENCE 


1480 AA; 165752 MW; 2CD1C421 CRC32; 


1 


DOMAIN 


747 


848 


LEUCINE-RICH REPEATS (4TH REGION) . 








f 


DOMAIN 


849 


910 


CONSERVED C-FLANKING REGION OF THE LRR. 


Query Match 


39.8%; Score 462; DB 1; Length 1480; 


FT 


REPEAT 


105 


115 


LRR 1-1. 


Best Local Similarity 42.7%; Pred. No. 7.20e-83; 


FT 


REPEAT 


116 


139 


LRR 1-2. 


Matches 67; Conservative 38; Mismatches 44; Indels 8; Gaps i 


FT 


REPEAT 


140 


163 


LRR 1-3. 








FT 


REPEAT 


164 


187 


LRR 1-4. 


DD 


1047 HYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMIS 1106 


FT 


REPEAT 


188 


211 


LRR 1-5, 




1 


: 1: 1 ll::| III :|:||||: III :| 1 1 1 : |:| II |: 


FT 


REPEAT 


212 


230 


LRR 1-6. 


Qy 


6 HHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCE — IP 61 


FT 


REPEAT 


327 


337 


LRR 2-1. 








FT 


REPEAT 


338 


361 


LRR 2 _ 2, 




1107 MMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELE 1166 


FT 


REPEAT 


362 


385 


LRR 2-3. 






::! 1: II :| III :|:| II: 1 II 1 |::|| :::::: 


FT 


REPEAT 


386 


409 


LRR 2-4. 


Qy 


62 PA-PRSS-CEGTECQNGA--NCVDQGSRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFT 117 


FT 


REPEAT 


410 


433 


LRR 2-5. 








FT 


REPEAT 


434 


452 


LRR 2-6, 


Do 


1167 PLRTRPEANVTIVFSSAEQNGILMYDGQDAHLAVELF 1203 


FT 


REPEAT 


551 


562 


LRR 3-1, 




1; 


: ||:|: 1 : 1 1 : 1 1 1 1 : 1 : | : : |:||||: 


FT 


REPEAT 


563 


586 


LRR 3-2. 


Qy 


118 DLQNWPRANITLQVSTAEDNGILLYNGDNDHIAVELY 154 


FT 


REPEAT 


587 


610 


LRR 3-3. 








FT 


REPEAT 


611 


634 


LRR 3-4. 








FT 


REPEAT 


635 


653 


LRR 3-5. 


RES 


JLT 2 




FT 


REPEAT 


747 


757 


LRR 4-1. 


ID 


FBP1_STRPU STANDARD; PRT; 1064 AA. 


FT 


REPEAT 


758 


781 


LRR 4-2, 


AC 


P10079; 




FT 


REPEAT 


782 


805 


LRR 4-3. 


DT 


01-MAR-1989 (REL. 10, CREATED) 


FT 


REPEAT 


806 


829 


LRR 4-4. 


DT 


01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 


FT 


REPEAT 


830 


848 


LRR 4-5. 


DT 


01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 


FT 


DOMAIN 


907 


944 


EGF-LIKE 1. 


DE 


FIBROPELLIN I PRECURSOR (EPIDERMAL GROWTH FACTOR-RELATED PROTEIN 1) 


FT 


DOMAIN 


946 


983 


EGF-LIKE 2. 


DE 


(DEGF-1) 




FT 


DOMAIN 


985 


1022 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL). 


GN 


EGF1. 






DOMAIN 


1024 


1062 


EGF-LIKE 4, 


OS 


STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN), 




DOMAIN 


1064 


1100 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 


OC 


EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ^ ECHINOIDEA; 


w 


DOMAIN 


1111 


1149 


EGF-LIKE 6, 


OC 


EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROT IDAE ; 


FT 


DOMAIN 


1353 


1392 


EGF-LIKE 7, 


OC 


STRONGYLOCENTROTUS. 


FT 


DOMAIN 


1409 


1480 


CTCK. 


RN 


[1] 




FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM), 


RP 


SEQUENCE 


FROM N,A, 


FT 


CARBOHYD 


111 


111 


POTENTIAL, 


RX 


MEDLINE; 90112459. 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 


RA 


DELGADILLO-REYNOSO M.G., ROLLO D.R., HURSH D.A., RAFF R.A.; 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


RT 


"Structural analysis of the uEGF gene in the sea urchin 


FT 


CARBOHYD 


435 


435 


POTENTIAL, 


RT 


strongylocentrotus purpuratus reveals more similarity to vertebrate 


FT 


CARBOHYD 


783 


783 


POTENTIAL. 


RT 


than to invertebrate genes with EGF-like repeats."; 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 


RL 


J. MOL. EVOL. 29:314-327(1989), 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


RN 


[2] 




FT 


CARBOHYD 


998 


998 


POTENTIAL. 


RP 


SEQUENCE 


OF 279-476 AND 781-1064 FROM N.A. 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


RX 


MEDLINE; 


87319677. 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 


RA 


HURSH D,A. , ANDREWS M.E., RAFF R.A.; 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL. 


RT 


"A sea urchin gene encodes a polypeptide homologous to epidermal 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL. 


RT 


growth factor."; 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL. 


RL, 


SCIENCE 237:1487-1490(1987). 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


RN 


[3] 




FT 


DISULFID 


916 


932 


BY SIMILARITY. 


RP 


AVIDIN-LIKE DOMAIN. 


FT 


DISULFID 


934 


943 


BY SIMILARITY, 


RX 


MEDLINE; 


89196806. 


FT 


DISULFID 


950 


961 


BY SIMILARITY. 


RA 


HUNT L.T 


, BARKER W.C.; 


FT 


DISULFID 


955 


971 


BY SIMILARITY. 


RT 


"Avidin-like domain in an epidermal growth factor homolog from a sea 
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RT urchin . " ; 

RL FASEB J. 3:1760-1764(1989). 

RN [4] 

RP CHARACTERIZATION, 

RX MEDLINE; 91285254. 

RA BISGROVE B.W., ANDREWS M.E., RAFF R.A.; 

RT "Fibropellins, products of an EGF repeat-containing gene, form a 

RT unique extracellular matrix structure that surrounds the sea urchin 

RT embryo."; 

RL DEV. BIOL. 146:89-99(1991). 

CC -!- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
CC MATRIX. 

CC -!- SUBCELLULAR LOCATION: EXTRACELLULAR. IN VESICLES IN THE CYTOPLASM 
CC OF UNFERTILIZED EGGS, THEN TO THE BASE OF THE HYALIN LAYER 
CC THROUGHOUT DEVELOPMENT AND FINALLY IN THE APICAL LAMINA IN LATE 
CC EMBRYOS AND EARLY LARVAE. 

■!■ DEVELOPMENTAL STAGE: MODERATE LEVELS IN UNFERTILIZED EGGS AND 
■ DURING EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN 
TC LATE MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS 
CC MAINTAINED THROUGH SUBSEQUENT STAGES. EXPRESSED BOTH MATERNALLY 
CC AND ZYGOTICALLY, 

CC ■!- ALTERNATIVE PRODUCTS: TWO FORMS (IA AND IB) ARE PRODUCED BY 
CC ALTERNATIVE SPLICING. THE SMALL FORM (IB) LACKS 8 EGF REPEATS, 

CC -!- SIMILARITY: CONTAINS 21 EGF-LIKE DOMAINS. 

CC •!■ SIMILARITY: CONTAINS 1 CUB DOMAIN. 

CC -!- SIMILARITY: THE C'TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
CC TO AVIDIN/STREPTAVIDIN, 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Sviss Institute of Bioinforraatics and the EMBL outstation - 

CC the European Bioinforraatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; L08692; G161467; -. 

DR EMBL; L08692; G161466; -. 

DR EMBL; X17530; 6667061; -. 

DR EMBL; M17421; G552260; -. 

DR EMBL; X17533; G667062; -. 

DR PIR; A29316; A29316. 

DR PROSITE; PS00010; ASX HYDROXYL; 19, 

DR PROSITE; PS00022; EGF J; 19. 

•PROSITE; PS00577; AVIDIN; 1. 
PROSITE; PS0118O; CUB; 1. 
PROSITE; PS01186; EGF_2; 19. 

DR PROSITE; PS01187; EGF CA; 19. 

DR PFAM; PF00008; EGF; 21. 

DR PFAM; PF00431; CUB; 1. 

DR HSSP; P01132; 1EPH. 

KW BIOTIN; ALTERNATIVE SPLICING; EGF-LIKE DOMAIN; REPEAT; SIGNAL; 

KW GLYCOPROTEIN. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


CHAIN 


20 


1064 


FIBROPELLIN I. 


FT 


DOMAIN 


20 


55 


EGF-LIKE 1, 


FT 


DOMAIN 


62 


175 


CUB, 


FT 


DOMAIN 


176 


212 


EGF-LIKE 2, .CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


442 


478 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


480 


516 


EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


518 


554 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


556 


592 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


594 


630 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


632 


668 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


670 


706 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


708 


744 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


746 


782 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 



FT 


DOMAIN 


784 


820 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


822 


858 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


860 


896 


EGF-LIKE 20. 


FT 


DOMAIN 


898 


934 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


936 


1064 


AVIDIN- LIKE. 


FT 


DISULFID 


23 


34 


BY SIMILARITY. 


FT 


DISULFID 


28 


43 


BY SIMILARITY. 


FT 


DISULFID 


45 


54 


BY SIMILARITY. 


FT 


DISULFID 


180 


191 


BY SIMILARITY. 


FT 


DISULFID 


185 


200 


BY SIMILARITY. 


FT 


DISULFID 


202 


211 


BY SIMILARITY. 


FT 


DISULFID 


218 


229 


BY SIMILARITY. 


FT 


DISULFID 


223 


238 


BY SIMILARITY, 


FT 


DISULFID 


240 


249 


BY SIMILARITY. 


FT 


DISULFID 


256 


267 


BY SIMILARITY. 


FT 


DISULFID 


261 


276 


BY SIMILARITY. 


FT 


DISULFID 


278 


287 


BY SIMILARITY, 


FT 


DISULFID 


294 


305 


BY SIMILARITY, 


FT 


DISULFID 


299 


314 


BY SIMILARITY. 


FT 


DISULFID 


316 


325 


BY SIMILARITY, 


FT 


DISULFID 


332 


343 


BY SIMILARITY, 


FT 


DISULFID 


337 


352 


BY SIMILARITY. 


FT 


DISULFID 


354 


363 


BY SIMILARITY. 


FT 


DISULFID 


370 


381 


BY SIMILARITY. 


FT 


DISULFID 


375 


390 


BY SIMILARITY, 


FT 


DISULFID 


392 


401 


BY SIMILARITY. 


FT 


DISULFID 


. 408 


419 


BY SIMILARITY. 


FT. 


DISULFID 


413 


428 


BY SIMILARITY. 


FT 


DISULFID 


430 


439 


BY SIMILARITY, 


FT 


DISULFID 


446 


457 


BY SIMILARITY. 


FT 


DISULFID 


451 


466 


BY SIMILARITY. 


FT 


DISULFID 


468 


477 


BY SIMILARITY. 


FT 


DISULFID 


484 


495 


BY SIMILARITY. 


FT 


DISULFID 


489 


504 


BY SIMILARITY. 


FT 


DISULFID 


506 


515 


BY SIMILARITY. 


FT 


DISULFID 


522 


533 


BY SIMILARITY. 


FT 


DISULFID 


527 


542 


BY SIMILARITY, 


FT 


DISULFID 


544 


553 


BY SIMILARITY, 


FT 


DISULFID 


560 


571 


BY SIMILARITY. 


FT 


DISULFID 


565 


580 


BY SIMILARITY. 


FT 


DISULFID 


582 


591 


BY SIMILARITY. 


FT 


DISULFID 


598 


609 


BY SIMILARITY. 


FT 


DISULFID 


603 


618 


BY SIMILARITY. 


FT 


DISULFID 


620 


629 


BY SIMILARITY. 


FT 


DISULFID 


636 


647 


BY SIMILARITY, 


FT 


DISULFID 


641 


656 


BY SIMILARITY, 


FT 


DISULFID 


658 


667 


BY SIMILARITY. 


FT 


DISULFID 


674 


685 


BY SIMILARITY, 


FT 


DISULFID 


679 


694 


BY SIMILARITY, 


FT 


DISULFID 


696 


705 


BY SIMILARITY, 


FT 


DISULFID 


712 


723 


BY SIMILARITY, 


FT 


DISULFID 


717 


732 


BY SIMILARITY. 


FT 


DISULFID 


734 


743 


BY SIMILARITY, 


FT 


DISULFID 


750 


761 


BY SIMILARITY. 


FT 


DISULFID 


755 


770 


BY SIMILARITY. 


FT 


DISULFID 


772 


781 


BY SIMILARITY. 


FT 


DISULFID 


788 


799 


BY SIMILARITY. 


FT 


DISULFID 


793 


808 


BY SIMILARITY. 


FT 


DISULFID 


810 


819 


BY SIMILARITY. 


FT 


DISULFID 


826 


837 


BY SIMILARITY. 


FT 


DISULFID 


831 


846 


BY SIMILARITY. 


FT 


DISULFID 


848 


857 


BY SIMILARITY. 


FT 


DISULFID 


864 


875 


BY SIMILARITY. 


FT 


DISULFID 


869 


884 


BY SIMILARITY. 


FT 


DISULFID 


886 


895 


BY SIMILARITY. 


FT 


DISULFID 


902 


913 


BY SIMILARITY. 


FT 


DISULFID 


907 


922 


BY SIMILARITY. 


FT 


DISULFID 


924 


933 


BY SIMILARITY. 


FT 


VARSPLIC 


477 


780 


MISSING (IN FORM IB). 


FT 


CARBOHYD 


30 


30 


POTENTIAL. 


FT 


CARBOHYD 


136 


136 


POTENTIAL, 


FT 


CARBOHYD 


851 


851 


POTENTIAL. 


FT 


CONFLICT 


279 


279 


L -> S (IN REF. 2), 
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SQ SEQUENCE 1064 AA; 112072 MW; FBD10D48 CRC32; 

Query Match 27.9%; Score 324; DB 1; Length 1064; 

Best Local Similarity 46.7*; Pred. No. 4 .14e-50; 

Matches 43; Conservative 16; Mismatches 30; Indels 3; Gaps 2; 

Db 808 CACVPGFTGSNCETNIDECASDPCLNGGICVDGVNGFVCQCPPNYSGTYCEIS-L--DAC 864 

I I: hll II I hi I II: III II:: I I III III: :| 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 865 RSMPCQNGATCVNVGADYVCECVPGYAGQNCE 896 

: Mill II: I: l|:|:||::| :|| 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



ID DLL1JOOSE STANDARD; PRT; 722 AA, 

AC Q61483; 

DT 01-NOV-1997 (REL, 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

•DELTA-LIKE PROTEIN 1 PRECURSOR (DELTAl). 
DLLl. 
MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=BALB/C X C57BL/6; TISSUE-EMBRYO; 

RX MEDLINE; 95401858, 

RA BETTENHAUSEN B., DE ANGELIS MX, SIMON D., GUENET J.-L, GOSSLER A,; 

RT "Transient and restricted expression during mouse embryogenesis of 

RT Dill, a murine gene closely related to Drosophila Delta."; 

RL DEVELOPMENT 121:2407-2418(1995) . 

CC -!- FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 

CC MAMMALIAN EMBRYOS, MAY HAVE A ROLE IN CELLULAR INTERACTIONS 

CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- TISSUE SPECIFICITY: IN THE EMBRYO, EXPRESSED IN THE PARAXIAL 

CC MESODERM AND NERVOUS SYSTEM, EXPRESSED AT HIGH LEVELS IN ADULT 

CC HEART AND AT LOWER LEVELS, IN ADULT LUNG. 

CC -!- DEVELOPMENTAL STAGE: EXPRESSED UNTIL DAY 15 IN THE EMBRYO. 

CC EXPRESSION THEN DECREASES AND INCREASES AGAIN IN THE ADULT. 

CC -!- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the ■ EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

fuse by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseGisb-sib.ch) . 

CC 

DR EMBL; X80903; G806570; -. 

DR MGD; MGI: 104659; DLLl. 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF.l; 8. 

DR PROSITE; PS01186; EGF 2; 8. 

DR PROSITE; PS01187; EGF_CA; 2, 

DR PFAM; PF00008; EGF; 6. 

DR HSSP; P00740; 1IXA. 

KW SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE. 



FT 


SIGNAL 


1 


17 


POTENTIAL, 


FT 


CHAIN 


18 


722 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


546 


568 


POTENTIAL, 


FT 


DOMAIN 


569 


722 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


225 


253 


EGF-LIKE 1, 


FT 


DOMAIN 


256 


284 


EGF-LIKE 2. 


FT 


DOMAIN 


291 


324 


EGF-LIKE 3. 


FT 


DOMAIN 


331 


362 


EGF-LIKE 4, CALCIUM BINDING 


FT 


DOMAIN 


369 


401 


EGF-LIKE 5. 



FT 


DOMAIN 


408 


439 


£Ajf LI fit O , 


FT 


IYMTN 


446 


477 


Wj( L1M1 /, LA1XJ.UM DiNLUNtj (PUIbNUAL) 


FT 


DOMAIN 


484 


515 


EGF-LIKE 8. 


FT 


DISULFID 


225 


236 


RV STMTT.IRTTV 


FT 


DISULFID 


229 


242 


BY SIMILARITY', 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 




DISULFID 


256 


267 


RY ^TMTI.IRTTV 


FT 


DISULFID 


262 


273 


BY SIMILARITY, 


FT 


DISULFID 


275 


284 


RY ^TMTT.HRTTY 


FT 




291 


303 


nv QTMTT1DTTV 
Dl OlNiunKiil. 


FT 


DISULFID 


297 


313 


RV CTMTT1DTTV 
Dl OlNlLnAll 1 . 


FT 


nT<?riTii?Tn 


315 


324 


RV CTUTIiDTTV 




nTcni uth 






DV CTUTT NDTTV 


y<T 


nKnTETri 

UlOUbr 1U 




isi 


DV CTUTT AOTTiV 

Dl MMILAKIIi , 




LUoULr ID 


«i 




BY SIMILARITY. 




nTQnrirn 


369 


ian 


DV OTUTT ADTT1V 

Dl blMlLAKJ.li , 




nTcnrFTn 
uiouLr iu 


374 


ion 


BY SIMILARITY, 


FT 


UloUbr XL) 


392 


401 


BY SIMILARITY, 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


FT 


DISULFID 


430 


439 


BY SIMILARITY. 


FT 


DISULFID 


446 


466 


BY SIMILARITY. 


FT 


DISULFID 


468 


477 


BY SIMILARITY. 


FT 


DISULFID 


484 


495 


BY SIMILARITY. 


FT 


DISULFID 


489 


504 


BY SIMILARITY, 


FT 


DISULFID 


506 


515 


BY SIMILARITY. 


FT 


CARBOHYD 


476 


476 


POTENTIAL. 


SQ 


SEQUENCE 


722 AA; 


78448 MW 


5A647702 CRC32; 



Query Match 27.7*; Score 321; DB 1; Length 722 ; 

Best Local Similarity 43,84; Pred, No, 2,05e-49; 

Matches 42; Conservative 19; Mismatches 32; Indels 3; Gaps 2 

Db. 428 CRCQAGFSGRYCEDNVDDCASSPCANGGTCRDSVNDFSCTCPPGYTGKNCS-APVSR--C 484 

I I l::| I :l III MM! II ::| I ||:| | :| :| | 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 485 EHAPCHNGATCHQRGQRYMCECAQGYGGPNCQFLLP 520 

I : 1:111 I ::l |.:|:| hill:!: ||: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 4 

ID DLL1JUMAN STANDARD; PRT; 723 AA. 
AC 000548; 

DT 15-JUL-1998 (REL. 36, CREATED) 
DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 
DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTAl). 
GN DLLl. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 
RN [1] 

RP SEQUENCE FROM N.A, 

RA MANN R.S., GRAY G.E., HENRIQUE D., ISH-HOROWICZ D., 
RA ARTAVANIS -TSAKONAS S.; 

RL SUBMITTED (MAY-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

-I- FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 
MAMMALIAN EMBRYOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 
UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 
SIMILARITY). 

-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
-!- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 
-!- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 



CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 
CC the European Bioinformatics Institute, There are no restrictions on its 
CC use by non-profit institutions as long as its content is in no way 
CC modified and this statement is not removed, Usage by and for commercial 
CC entities requires a license agreement (See http://www, isb-sib.cn/announce/ 
CC or send an email to license@isb-sib.ch) . 
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US-09-191-! 



■647-10. rsp 



Page 5 



cc 

DR EMBL; AF0O3522; G2197069; -. 

DR PROSITE; PS00010; ASX HYDROXY L; 3. 

DR PROSITE; PS0Q022; EGF_1 ; 8. 

DR PROSITE; PS01186; EGFJ; 8. 

DR PROSITE; PS01187; EGF_CA; 1. 

DR PFAM; PF00008; EGF; 6. 

DR HSSP; P00740; 1IXA. 



KW SIGNAL; EGF -LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE, 



FT 


SIGNAL 


1 


17 


POTENTIAL, 


FT 


CHAIN 


18 


723 


DELTA- LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


546 


568 


POTENTIAL, 


FT 


DOMAIN 


569 


723 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


226 


254 


EGF-LIKE 1. 


FT 


DOMAIN 


257 


285 


EGF-LIKE 2, 


i 


DOMAIN 


292 


325 


EGF-LIKE 3, 


■ 


DOMAIN 


332 


363 


EG 


'■LIKE 4, CALCIUM BINDING (POTENTIAL). 




DOMAIN 


370 


402 


EG 


"-LIKE 5, 


FT 


DOMAIN 


409 


440 


EGF-LIKE 6. 


FT 


DOMAIN 


447 


478 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


485 


516 


EGF-LIKE 8, 


FT 


DISULFID 


226 


237 


BY 


SIMILARITY. 


FT 


DISULFID 


230 


243 


BY 


SIMILARITY. 


FT 


DISULFID 


245 


254 


BY 


SIMILARITY. 


FT 


DISULFID 


257 


268 


BY 


SIMILARITY. 


FT 


DISULFID 


263 


274 


BY 


SIMILARITY. 


FT 


DISULFID 


276 


285 


BY 


SIMILARITY. 


FT 


DISULFID 


292 


304 


BY 


SIMILARITY. 


FT 


DISULFID 


298 


314 


BY 


SIMILARITY. 


FT 


DISULFID 


316 


325 


BY 


SIMILARITY. 


FT 


DISULFID 


332 


343 


BY 


SIMILARITY. 


FT 


DISULFID 


337 


352 


BY 


SIMILARITY. 


FT 


DISULFID 


354 


363 


BY 


SIMILARITY. 


FT 


DISULFID 


370 


381 


BY 


SIMILARITY. 


FT 


DISULFID 


375 


391 


BY 


SIMILARITY. 


FT 


DISULFID 


393 


402 


BY 


SIMILARITY. 


FT 


DISULFID 


409 


420 


BY 


SIMILARITY. 


FT 


DISULFID 


414 


429 


BY 


SIMILARITY, 


FT 


DISULFID 


431 


440 


BY 


SIMILARITY, 


FT 


DISULFID 


447 


467 


BY 


SIMILARITY. 


FT 


DISULFID 


469 


478 


BY 


SIMILARITY. 


FT 


DISULFID 


485 


496 


BY 


SIMILARITY. 


FT 


DISULFID 


490 


505 


BY 


SIMILARITY. 




DISULFID 


507 


516 


BY 


SIMILARITY. 


1 


CARBOHYD 


477 


477 


POTENTIAL, 




SEQUENCE 


723 AA; 


77956 MW; 


UD48BDB CRC32; 



Query Match 27.71; Score 321; DB 1; Length 723; 

Best Local Similarity 43,84; Pred. No. 2,05e-49; 

Matches 42; Conservative 20; Mismatches 31; Indels 3; Gaps 2; 

Db 429 CRCQAGFSGRHCDDNVDDCASSPCANGGTCRDGVNDFSCTCPPGYTGRNCS-APVSR--C 485 

I I :l :l III Mh I I II ::| I Ihh I :| :| I 

Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 486 EHAPCHNGATCHERGHGYVCECARGYGGPNCQFLLP 521 

I : hill I I; |:|||:|: ||: 

Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 5 

ID DLLlJAT STANDARD; PRT; 714 AA, 

AC P97677; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL, 35, LAST ANNOTATION UPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTAl). 

GN DLL1. 

OS RATTUS NORVEGICDS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EOTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] ' 



RP SEQUENCE FROM N.A. 

RA DISIBIO G., HEBSHI L., BOULTER J., WEINMASTER G.; 

RL SUBMITTED (DEC-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 

CC -!- FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 

CC MAMMALIAN EMBRYOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 

CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 

CC SIMILARITY) . 

CC -!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 

CC -!- SIMILARITY; CONTAINS 8 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY; TO DROSOPHILA DELTA PROTEIN. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinforraatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

cc 

DR EMBL; U78889; G1699046; •. 

DR PROSITE; PS00010; ASX HYDROXYL; 3. 

DR PROSITE; PS00022; EGFJ; 8. 

DR PROSITE; PS01186; EGFJ; 8. 

DR PROSITE; PS01187; EGF CA; 2. 

DR PFAM; PF00008; EGF; 6. 

DR HSSP; P00740; 1IXA. 

KW SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE . 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


714 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


537 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


538 


560 


POTENTIAL. 


FT 


DOMAIN 


561 


714 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


225 


253 


EG 


"-LIKE 1. 


FT 


DOMAIN 


256 


284 


EGF-LIKE 2. 


FT 


DOMAIN 


291 


324 


EG 


'-LIKE 3. 


FT 


DOMAIN 


331 


362 


EG 


'■LIKE 4, CALCIUM BINDING (POTENTIAL) 


FT 


DOMAIN 


369 


401 


EG 


'-LIKE 5, 


FT 


DOMAIN 


408 


439 


EGF-LIKE 6. 


FT 


DOMAIN 


446 


477 


EG 


'-LIKE 7, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


484 


515 


EG 


'-LIKE 8. 


FT 


DISULFID 


225 


236 


BY 


SIMILARITY. 


FT 


DISULFID 


229 


242 


BY 


SIMILARITY. 


FT 


DISULFID 


244 


253 


BY 


SIMILARITY, 


FT 


DISULFID 


256 


267 


BY 


SIMILARITY. 


FT 


DISULFID 


262 


273 


BY 


SIMILARITY. 


FT 


DISULFID 


275 


284 


BY 


SIMILARITY. 


FT 


DISULFID 


291 


303 


BY 


SIMILARITY. 


FT 


DISULFID 


297 


313 


BY 


SIMILARITY. 


FT 


DISULFID 


315 


324 


BY 


SIMILARITY, 


FT 


DISULFID 


331 


342 


BY 


SIMILARITY, 


FT 


DISULFID 


336 


351 


BY 


SIMILARITY. 


FT 


DISULFID 


353 


362 


BY 


SIMILARITY. 


FT 


DISULFID 


369 


380 


BY 


SIMILARITY, 


FT 


DISULFID 


374 


390 


BY 


SIMILARITY, 


FT 


DISULFID 


392 


401 


BY 


SIMILARITY. 


FT 


DISULFID 


408 


419 


BY 


SIMILARITY. 


FT 


DISULFID 


413 


428 


BY 


SIMILARITY. 


FT 


DISULFID 


430 


439 


BY 


SIMILARITY. 


FT 


DISULFID 


446 


466 


BY 


SIMILARITY, 


FT 


DISULFID 


468 


477 


BY 


SIMILARITY, 


FT 


DISULFID 


484 


495 


BY 


SIMILARITY. 


FT 


DISULFID 


489 


504 


BY 


SIMILARITY. 


FT 


DISULFID 


506 


515 


BY 


SIMILARITY. 


FT 


CARBOHYD 


476 


476 


POTENTIAL. 



SQ SEQUENCE 714 AA; 77378 MW; 604B76D1 CRC32; 

Query Match 27,2*; Score 315; DB 1; Length 714; 

Best Local Similarity 42,7%; Pred. No. 5.03e-48; 

Matches 41; Conservative 21; Mismatches 31; Indels 3; Gaps 2; 

Db 428 CRCQTGFSGRYCEDNVDDCASSPCANGGTCRDSVNDFSCTCPPGYTGRNCS-APVSR--C 484 

I I |::l | :| IN | ||: | | I ::| I ||:|: I :| :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 
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Db 485 EHAPCHNGATCHQRGQRYMCECAQGYGGANCQFLLP 520 

I : hill I ::| I :|:| |:||::|: lh 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 6 

ID FBP3.STRPU STANDARD; PRT; 570 AA. 

AC P49013; 

DT Ol-FEB-1996 (REL. 33, CREATED) 

DT Ol-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN C PRECURSOR (EPIDERMAL GROWTH FACTOR- RELATED PROTEIN 3) 

DE (EGF III) (FIBROPELLIN III). 

GN EGF3, 

OS STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN). 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 

OC EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROTIDAE; 

OC STRONGYLOCENTROTUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

•TISSUE-GASTRULA; 
MEDLINE; 93273088. 
BISGROVE B.H., RAFF R.A.; 

RT "The SpEGF in gene encodes a member of the fibropellins: EGF repeat - 

RT containing proteins that form the apical lamina of the sea urchin 

RT embryo."; 

RL DEV. BIOL. 157:526-538(1993). 

CC -I- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
CC MATRIX. 

CC -!- SUBCELLULAR LOCATION: EXTRACELLULAR. 

CC -!- DEVELOPMENTAL STAGE; LOW LEVELS IN UNFERTILIZED EGGS AND DURING 

CC EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN LATE 

CC MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS MAINTAINED 

CC THROUGH SUBSEQUENT STAGES, 

CC -!- EXPRESSED BOTH MATERNALLY AND ZYGOTICALLY, 

CC •!• SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS, 

CC -I- SIMILARITY: CONTAINS 1 CUB DOMAIN. 

CC -!- SIMILARITY: THE C'TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
CC TO AVIDIN/STREPTAVIDIN, 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

*EMBL; L07045; G310660; -. 
PROSITE; PS00010; ASX HYDROXYL; 8. 
PROSITE; PS00022; EGFJL; 8. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01180; CUB; 1, 

DR PROSITE; PS01186; BGFJ; 7. 

DR PROSITE; PS01187; EGF.CA; 6. 

DR PFAM; PF00008; EGF; 8, 

DR PFAM; PF00431; CUB; 1. 

DR HSSP; P00740; 1IXA, 

KW BIOTIN; EGF-LIKE DOMAIN; REPEAT; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


570 


FIBROPELLIN C. 


FT 


DOMAIN 


18 


55 


EGF-LIKE 1. 


FT 


DOMAIN 


62 


175 


CUB. 


FT 


DOMAIN 


176 


212 


EGF-LIKE 2, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7. 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM- BINDING (POTENTIAL) . 


FT 


DOMAIN 


442 


570 


AVIDIN-LIKE. 


FT 


DISULFID 


23 


34 


BY SIMILARITY. 



FT 


DISULFID 


28 


43 


BY 


SIMILARITY, 


FT' 


DISULFID 


45 


54 


BY 


SIMILARITY. 


FT 


DISULFID 


180 


191 


BY 


SIMILARITY, 


FT 


DISULFID 


185 


200 


BY 


SIMILARITY, 


FT 


DISULFID 


202 


211 


BY 


SIMILARITY. 


FT 


DISULFID 


218 


229 


BY 


SIMILARITY. 


FT 


DISULFID 


223 


238 


BY 


SIMILARITY. 


FT 


DISULFID 


240 


249 


BY 


SIMILARITY. 


FT 


DISULFID 


256 


267 


BY 


SIMILARITY. 


FT 


DISULFID 


261 


276 


BY 


SIMILARITY. 


FT 


DISULFID 


278 


287 


BY 


SIMILARITY. 


FT 


DISULFID 


294 


305 


BY 


SIMILARITY, 


FT 


DISULFID 


299 


314 


BY 


SIMILARITY. 


FT 


DISULFID 


316 


325 


BY 


SIMILARITY. 


FT 


DISULFID 


332 


343 


BY 


SIMILARITY, 


FT 


DISULFID 


337 


352 


BY 


SIMILARITY. 


FT 


DISULFID 


354 


363 


BY 


SIMILARITY. 




DISULFID 


370 


381 


BY 


SIMILARITY. 


FT 


DISULFID 


375 


390 


BY 


SIMILARITY, 


FT 


DISULFID 


392 


401 


BY 


SIMILARITY, 


FT 


DISULFID 


408 


419 


BY 


SIMILARITY. 


FT 


DISULFID 


413 


428 


BY 


SIMILARITY. 


FT 


DISULFID 


430 


439 


BY 


SIMILARITY. 


FT 


CARBOHYD 


30 


30 


POTENTIAL, 


FT 


CARBOHYD 


136 


136 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


SO 


SEQUENCE 


570 AA; 


61116 MW; 


265BC4BB CRC32 



Query Match 26,7%; Score 310; DB 1; Length 570; 

Best Local Similarity 42,4%; Pred. No, 7.18e-47; 

Matches 39; Conservative 19; Mismatches 31; Indels 3; Gaps 2; 

Db 314 CDCRAGFTGSNCETNINECASSPCLNGGSCLDGVDGYVCQCLPNYTGTHCEIS -L* -DAC 370 

1:1 1:11 II I ::| I II: hi l::l I |: hi III: :| 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 371 ASLPCQNGGVCTNVGGDYVCECLPGYTGINCE 402 



Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 7 

ID NOTC.DROME STANDARD; PRT; 2703 AA, 

AC P07207; P04154; 

DT 01-NOV-1986 (REL, 03, CREATED) 

DT Ol-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN PRECURSOR, 

GN N. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY) , 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 86079539. 

RA WHARTON K.A., JOHANSEN K.M., XU T., ARTAVANIS-TSAKONAS S.; 

RT "Nucleotide sequence from the neurogenic locus notch implies a gene 

RT product that shares homology with proteins containing EGF- like 

RT repeats."; 

RL CELL 43:567-581(1985). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN°OREGON-R; 

RX MEDLINE; 87064624. 

RA KIDD S., KELLEY M.R., YOUNG M.W.; 

RT "Sequence of the notch locus of Drosophila melanogaster ; relationship 

RT of the encoded protein to mammalian clotting and growth factors . " ; 

RL MOL. CELL. BIOL. 6:3094-3108(1986). 

RN [3] 

RP SEQUENCE OF 2505-2611 FROM N.A. 

RX MEDLINE; 85099329. 

RA WHARTON K.A,, YEDVOBNICK B., FINNERTY V.G., ARTAVANIS-TSAKONAS S.; ' 
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RT "opa: a novel family of transcribed repeats shared by the Notch locus 

RT and other developmental^ regulated loci in D, melanogaster."; 

RL CELL 40:55-62(1985). 

RN [4] 

RP SEQUENCE OF 1-8 FROM N.A. 

RX MEDLINE; 87257846. 

RA KELLEY M.R., KIDD S., BERG R.L., YOUNG M.W.; 

RT "Restriction of P-element insertions at the Notch locus of Drosophila 

RT melanogaster."; 

RL MOL. CELL. BIOL. 7:1545-1548(1987). 

RN [5] 

RP REVIEW. 

RA HARRIS W. A.; 

RT "Many cell types specified by Notch function."; 

RL CURR. BIOL. 1:120-122(1991). 

^C ■!- FUNCTION: NOTCH PROTEIN IS ESSENTIAL FOR PROPER DIFFERENTIATION OF 
m ECTODERM. 

W -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

TC -!- SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 

CC OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 

CC THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://vww.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; M16152; G157988; -, 

DR EMBL; M16153; G157988; JOINED, 

DR EMBL; M16149; G157988; JOINED. 

DR EMBL; M16150; G157988; JOINED. 

DR EMBL; M16151; G157988; JOINED. 

DR EMBL; K03508; G157993; -, 

DR EMBL; M13689; G157993; JOINED. 

DR EMBL; K03507; G157993; JOINED. 

DR EMBL; M12175; G950317; -. 

DR EMBL; M16025; G157995; •. 

K PIR; A24420; A24420. 

V: PIR; A24768; A24768. 

TO PIR; A05267; A05267. 

DR FLYBASE; FBgn0004647; N. 

DR PROSITE; PS00010; ASX.HYDROXYL; 22. 

DR PROSITE; PS00022; EGF_1; 34, 

DR PROSITE; PS01186; EGF 2; 28. 

DR PROSITE; PS01187; EGF CA; 22. 

DR PFAM; PF00008; EGF; 36. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3, 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 
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Note; remainder of annotations omitted. 

Query Match 25,9%; Score 301; DB 1; Length 2703; 

Best Local Similarity 40.7%; Pred. No. 8.47e-45; 

Matches 37; Conservative 19; Mismatches 32; Indels 3; Gaps 2; 

Db 1008 CTCPWFSGINCQTNDEDCTESSCLNGGSCIDGINGYNCSCLAGYSGANCQYKLN-K--C 1064 

I I II::! II l::ll : I II: hi :hl I |: llll I: : I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 1065 DSNPCLNGATCHEQNNEYTCHCPSGFTGKQC 1095 

I III I : : |:| :|| | :| 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPEC 99 



RESULT 8 

ID NOTCJRARE STANDARD; PRT; 2437 AA. 

AC P46530; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH H0M0L0G PROTEIN PRECURSOR. 

GN NOTCH. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

#TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 
CYPRINIDAE; RASBORINAE; DANIO. 
[1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 94128602. 

RA BIERKAMP C . , CAMPOS -ORTEGA J. A. ; 

RT "A zebrafish homologue of the Drosophila neurogenic gene Notch and 

RT its pattern of transcription during early embryogenesis."; 

RL MECH. DEV. 43:87-100(1993). 

CC -!■ FUNCTION: IMPLICATED IN CELL FATE SPECIFICATIONS DURING 
CC EMBRYO DEVELOPMENT. MAY BE INVOLVED IN THE FORMATION OF THE 
CC NEURAL PLATE, N0T0CH0RD AND BRAIN VESICLES. 

CC ■!■ SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

' CC -!- DEVELOPMENTAL STAGE: EXPRESSED IN ALL CELLS IN PREGASTRULATION 
CC STAGES . DURING GASTRULATION IS DIFFERENTIALLY EXPRESSED, 
CC ACCUMULATING PREDOMINANTLY IN THE PRECHORDAL MESODERM AND 
CC NOTOCHORD. AT THE END OF GASTRULATION, EXPRESSED ALONG THE 
CC ANTERIOR-POSTERIOR AXIS INCLUDING THE DEVELOPING NEURAL PLATE 
CC AND DIFFERENTIATING MESODERM. ALSO PRESENT IN THE DEVELOPING 
CC BRAIN AND HEAD REGIONS. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC •!• SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC •!• SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 



■I- SIMILARITY: CONTAINS 6 ANK REPEATS. 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation • 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license?isb-sib.ch), 

EMBL; X69088; G433867; -. 
PROSITE; PS00010; ASXJYDROXYL; 23. 
PROSITE; PS00022; EGF 1; 34. 
PROSITE; PS01186; EGF J; 28. 
PROSITE; PS01187; EGF.CA; 22, 
PFAM; PF00008; EGF; 36. 
PFAM; PF00023; ank; 6. 
PFAM; PF00066; notch; 3. 
HSSP; P00740; 1IXA, 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 
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DV CTMTT J\DTfflV 

Hi blMlbAKHY 


FT 




439 


448 


BY SIMILARITY 




DlaULr ID 


455 


466 


BY SIMILARITY 


FT 


DlbULHD 


460 


475 


BY SIMILARITY 




TMOrTT FTTl 

UlaUJjt ID 


477 


486 


BY SIMILARITY 


PT 


nicnr ptfi 
UloUJjt ID 




503 


BY SIMILARITY 


p* 


HTOrTT pm 

DIaUJjf ID 


498 




BY SIMILARITY 


PT 


PlTCrTT PTfl 

maULr LU 






BY SIMILARITY 


FT 


HTCTTT PTTI 
LIlOULf LU 


530 


IU1 


DV CTMTT SDTT1V 

ox blMlLAKHY 




U15UW LU 






DV CTMTT ADTWV 

Bi blMlLAJulY 




TMCrTT pm 




561 


BY SIMILARITY 


FT 


HTCrTT pm 
UloULr LU 






DV CTMTT ADTTV 

BY SIMILARITY 


■ 


nTCrn nri 
UlaULr 11) 


1 


587 


BY SIMILARITY 




nTCTTT PTH 

UlbULt ID 






BY SIMILARITY 




UlSULr ID 




616 


BY SIMILARITY 


FT 


DlbULr ID 


610 


625 


BY SIMILARITY 




LUoULf ID 






DV CTVTT ADTTV 


PT 
* 


UISULI LU 


643 


653 


T3V CTVTT 1DTTV 




UloULr ID 






DV CTMTT flDTTV 


FT 


nTCrn FTP 


664 


673 


DV CTMTT SDTTV 


FT 




680 


691 


P.V CTMTT ABTTV 


FT 


DISULFID 


685 


700 


DV CTMTT ARTTY 
Dl DlrllLnnlll 


FT 


UL JUJJt LU 


702 


711 


dv CTMTT ARTTV 


FT 


DISULFID 


718 


728 


□v CTMTT ARTTV 


FT 


DISULFID 


723 


737 


RY C.TMTT.ARTTV 


FT 


DISULFID 


739 


748 


BY SIMILARITY 


FT 


DISULFID 


755 


766 


RY CTMTT.ARTTY 


FT 


DISULFID 


760 


775 


BY SIMILARITY 


FT 


DISULFID 


777 


786 


BY SIMILARITY 


FT 


DISULFID 


793 


804 


RY ejVTTlRTTY 
Dl oj-MiLnlUl I 




DISULFID 


798 


813 


RY CJVTT.ARTTY 
Dl olrllLinftl 1 1 


FT 


DISULFID 


815 


824 


BY SIMILARITY 


FT 


DISULFID 


831 


842 


BY SIMILARITY 


FT 


DISULFID 


836 


853 


BY SIMILARITY 


FT 


DISULFID 


855 


864 


BY SIMILARITY 


FT 


DISULFID 


871 


882 


BY SIMILARITY 


FT 


DISULFID 


876 


891 


BY SIMILARITY 


FT 


DISULFID 


893 


902 


BY SIMILARITY 


FT 


DISULFID 


909 


920 


BY SIMILARITY 


FT 


DISULFID 


914 


929 


BY SIMILARITY 


FT 


DISULFID 


931 


940 


BY SIMILARITY 



FT 


DISULFID 


947 


958 


BY 


SIMILARITY. 


FT 


DISULFID 


952 


967 


BY 


SIMILARITY. 


FT 


DISULFID 


969 


978 


BY 


SIMILARITY. 


FT 


DISULFID 


1023 


1034 


BY 


SIMILARITY. 


FT' 


DISULFID 


1028 


1043 


BY 


SIMILARITY. 


FT 


DISULFID 


1045 


1054 


BY 


SIMILARITY. 


FT 


DISULFID 


1061 


1072 


BY 


SIMILARITY. 


FT 


DISULFID 


1066 


1081 


BY 


SIMILARITY. 


FT 


DISULFID 


1083 


1092 


BY 


SIMILARITY, 


FT 


DISULFID 


1099 


1120 


BY 


SIMILARITY, 


FT 


DISULFID 


1114 


1129 


BY 


SIMILARITY, 




TUCrTTPTTl 






BY 


CTMTT J\t)T1iV 


FT 


DISULFID 


1147 


1158 


BY 


SIMILARITY, 


FT 


DISULFID 


1152 


1167 


BY 


SIMILARITY, 


FT 


DISULFID 


1169 


1178 


BY 


SIMILARITY, 


FT 


DISULFID 


1185 


1196 


BY 


SIMILARITY. 


FT 


DISULFID 


1190 


1205 


BY 


SIMILARITY, 


FT 


DISULFID 


1207 


1216 


BY 


SIMILARITY. 


FT 


DISULFID 


1223 


1242 


BY 


SIMILARITY. 


FT 


DISULFID 


1236 


1251 


BY 


SIMILARITY, 


FT 


DISULFID 


1253 


1262 


BY 


SIMILARITY. 



Note: remainder of annotations omitted. 



Query Match 25.8%; Score 299; DB 1; Length 2437; 

Best Local Similarity 44.14; Pred. No. 2,44e-44; 

Matches 41; Conservative 14; Mismatches 34; Indels 4; Gaps 3; 

Db 474 HC ICMPGYEGVFCQ I NSDDC ASQPCLNG - KC IDK INSFHC EC PKGFSGS LCQVD - V - - DE 529 

:| II II I I I III : I II hi :lh I I h i II:: 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 530 CASTPCKNGAKCTDGPNKYTCECTPGFSGIHCE 562 

I :| I llhl I :: hi llhl II 
Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 9 

ID NTC3J0USE STANDARD; PRT; 2318 AA, 

AC Q61982; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH 3 PROTEIN. 

GN N0TCH3. 

OS MOS MUSCULUS (MOUSE) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENT IA; SCIOROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] ' 

RP SEQUENCE FROM N.A, 

RC STRAIN-ICR X SWISS WEBSTER; 

RX MEDLINE; 95001556. 

RA LARDELLI M . , DALSTRAND J . , LENDAHL U . ; 

RT "The novel Notch homologue mouse Notch 3 lacks specific epidermal 

RT growth factor-repeats and is expressed in proliferating 

RT neuroepithelial!. "; 

RL MECH. DEV. 46:123-136(1994). 

CC -!- FUNCTION: NOTCH 1, 2 AND 3 PLAY A COMBINATIONAL ROLE DURING 
CC VARIOUS CELL FATE DECISIONS AND MORPHOLOGICAL MOVEMENTS IN THE 
CC DEVELOPING CNS AND PROBABLY OTHER REGIONS OF THE EMBRYO. 

CC -!- TISSUE SPECIFICITY: PROLIFERATING NEUROEPITHELIUM. 

CC -!- DEVELOPMENTAL STAGE: CNS DEVELOPMENT. 

CC -!- SIMILARITY: CONTAINS 34 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license?isb-sib,ch). 
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DR EMBL; X74760; G483581; -. 

DR MGD; MGI : 99460 ; NOTCH3. 

DR PROSITE; PS00010; ASX HYDROXYL; 18. 

DR PROSITE; PS00022; EGF_1; 33. 

DR PROSITE; PS01186; EGFJ; 27. 

DR PROSITE; PS01187; EGF CA; 17. 

DR PFAM; PF00008; EGF; 33, 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN. 





DOMAIN 


I 


1643 


PYTBSPPT TTTT lt> 
CiAi KALLLJjULAK . 


FT 


TRANSMEM 


1644 


1664 












CYTOPLASMIC . 


FT 


DOMAIN 


39 


1374 


34 X EGF'TYPE REPEATS. 


FT 


DOMAIN 


1388 


1503 


i A Lily NO 1 In KLrLAIb. 


FT 


DOMAIN 


1784 


1998 


y rnn n /chtk dpdpitc 

0 A ^L/^lU/aniU KfirLAlO. 

PEST. 


FT 


DOMAIN 


2242 


2261 


FT 


DOMAIN 


39 


78 


EGF-LIKE 1. 


1 


DOMAIN 


79 


119 


FAp.T TVP 0 




DOMAIN 


120 


157 . 




w 


DOMAIN 


159 


196 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


198 






FT 


DOMAIN 


237 


273 


Fr.P-TT1fP f\ P&TfTrTU.OTMnTXTP t IWTipxtititivt \ 

the Llf>ll Of UUillUM BlNDlNu (rUlLNlIAL), 


FT 


DOMAIN 


275 


313 


PCT-TJKF 7 
Lor / , 


FT 


DOMAIN 


315 


351 


TTCT-T TVP 9 riTrTriM-nTKiriTMr f DfOTBMTTUT \' 

tut uiiui o, mjjLiuw Diwuiwb [ruitniiAL). 










PPP-T TVP Q 

the LIKE, y, 


FT 




392 


430 


EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


432 




EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL) 




DOMAIN 


470 


506 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


508 


544 


Che 1J( UvuvlUM DlWUlIHj (rUiiiNHAL) 


FT 


DOMAIN 


546 


581 


the blAll 14, UUjLIUM DlNDlNlj (FUiLNTIAL) 


FT 


DOMAIN 


583 


619 


CiUC blha 13, UUiLlUM DlNDlDHj (rUlLWllAL) 




DOMAIN 


621 


656 


the LlAti id, LALLlUM-rilNDIMj (POTENTIAL) 


FT 


DOMAIN 


658 


694 


EGF-TJKF, 17 rAT.rTTIM-'RTHriTKf; fPfiTFMTTin 


FT 


DOMAIN 


696 


731 


EGF-LIKE 18. 


FT 


DOMAIN 


735 


771 


EGF-LIKE 19, 


FT 


DOMAIN 


772 


809 


EGF-LIKE 20, 


FT 


DOMAIN 


811 


848 




FT 


DOMAIN 


850 


886 


tun blt\L iii LALL1UM rjlNJJlNlj (ryltMllAL) 


FT 


DOMAIN 


888 


923 


F(2P-T TVP 17 rjT PTITM-DTXTnTXTr /twppxiititut \ 
the LIRE, ii, LAL11UM rJlNDlNb (PUTLNTIAL) 


FT 


DOMAIN 


925 


961 


Fftp-Tjvp 
Luc LlArj i<* , 


FT 


DOMAIN 


963 


999 


FCP-TTVP It, 


FT 


DOMAIN 


1001 


1035 


EGF-LIKE 26. 


FT 


DOMAIN 


1037 


1083 


EGF-LIKE 27, 


FT 


DOMAIN 


1085 


1121 


EGF-LIKE 28. 


FT 


DOMAIN 


1123 


1159 


FfZF-T.TKF PlTrTrTW-nTHTiTKIf 1 /DATPXWTM \ 
the LIRd ly, LALL1UM BINDING (FUlLNllAL) 


„ 


DOMAIN 


1161 


1204 


but L1AIL OU, LniiUlUM tflPILIlnv? (rUlLNllALJ 




DOMAIN 


1206 


1245 


EGF-LIKE 31. 


V 


DOMAIN 


1247 


1288 


EGF-LIKE 32. 


FT 


DOMAIN 


1290 


1326 


EGF-LIKE 33. 


FT 


DOMAIN 


1336 


1374 


EGF-LIKE 34, 


FT 


REPEAT 


1388 


1428 


LIN/NOTCH 1. 


FT 


REPEAT 


1429 


1467 


LIN/NOTCH 2. 


FT 


REPEAT 


1468 


1503 


LIN/NOTCH 3. 


FT 


REPEAT 


1784 


1816 


CDC10/SWI6 1. 


FT 


REPEAT 


1817 


1865 


CDC10/SWI6 2. 


FT 


REPEAT 


1866 


1898 


CDC10/SWI6 3. 


FT 


REPEAT 


1899 


1932 


CDC10/SWI6 4 . 


FT 


REPEAT 


1933 


1965 


CDC10/SWI6 5. 


FT 


REPEAT 


1966 


1998 


CDC10/SWI6 6. ■ 


FT 


DISULFID 


43 


55 


BY SIMILARITY. 


FT 


DISULFID' 


49 


66 


BY SIMILARITY. 


FT 


DISULFID 


68 


77 


BY SIMILARITY. 


FT 


DISULFID 


83 


94 


BY SIMILARITY, 


FT 


DISULFID 


88 


107 


BY SIMILARITY. 


FT 


DISULFID 


109 


118 


BY SIMILARITY. 


FT 


DISULFID 


124 


135 


BY SIMILARITY. ' 


FT 


DISULFID 


129 


145 


BY SIMILARITY. 


FT 


DISULFID 


147 


156 


BY SIMILARITY, 


FT 


DISULFID 


163 


175 


BY SIMILARITY. 





nTcnrrTti 
uiouLriu 






BY 


SIMILARITY. 


FT 






i« 


BY 


SIMILARITY, 






ill 


in 


BY 


SIMILARITY. 




UloUbt ID 


207 


l~ 
223 


BY 


SIMILARITY, 


FT 






234 


BY 


SIMILARITY. 




HTCrTT PTfl 

UioULt ID 




252 


BY 


SIMILARITY. 


pi 


DISULFID 


246 


261 


BY 


SIMILARITY. 




DISULFID 


263 


272 


BY 


SIMILARITY. 


ft 


UlsULtlU 


279 


292 


BY 


SIMILARITY. 


Dm 


UlbULt ID 


286 


301 


BY 


SIMILARITY . 


FT' 


DISULFID 


303 


312 


BY 


SIMILARITY. 


FT 


DISULFID 


319 


330 


BY 


SIMILARITY. 


FT 


DISULFID 


324 


339 


BY 


SIMILARITY. 


FT 


DISULFID 


341 


350 


BY 


SIMILARITY. 


FT 


DISULFID 


356 


367 


BY 


SIMILARITY, 


FT 


DISULFID 


361 


378 


BY 


SIMILARITY. 


FT 


DISULFID 


380 


389 


BY 


SIMILARITY, 


FT 


DISULFID 


396 


409 


BY 


SIMILARITY. 


FT 


DISULFID 


403 


418 


BY 


SIMILARITY. 


FT 


DISULFID 


420 


429 


BY 


SIMILARITY, 


FT 


DISULFID 


436 


447 


BY 


SIMILARITY. 


FT 


DISULFID 


441 


456 


BY 


SIMILARITY. 


FT 


DISULFID 


458 


467 


BY 


SIMILARITY. 


FT 


DISULFID 


474 


485 


BY 


SIMILARITY. 




DISULFID 


479 


494 


BY 


SIMILARITY. 


FT 


DISULFID 


496 


505 


BY 


SIMILARITY, 


FT 


DISULFID 


512 


523 


BY 


SIMILARITY, 


FT 


DISULFID 


517 


532 


BY 


SIMILARITY. 




DISULFID 


534 


543 


BY 


SIMILARITY, 


FT 


DISULFID 


550 


560 


BY 


SIMILARITY, 


FT 


DIoULFID 


555 


569 


BY 


SIMILARITY. 


FT 


DISULFID 


571 


580 


BY 


SIMILARITY. 


FT 


DISULFID 


587 


598 


BY 


SIMILARITY. 




UlbULf ID 


592 


607 


BY 


SIMILARITY. 


FT 


DlbULrlD 


609 


618 


BY 


SIMILARITY, 


J 


DlbULrlD 


625 


635 


BY 


SIMILARITY. 


It 


HTCrTT FTP 

DloULMD 


630 


644 


BY 


SIMILARITY. 


FT 


DISULFID 


646 


655 


BY 


SIMILARITY. 


FT 


DISULFID 


662 


673 


BY 


SIMILARITY. 


FT 


DISULFID 


667 


682 


BY 


SIMILARITY. 


FT 


DISULFID 


684 


693 


BY 


SIMILARITY. 


FT 


DISULFID 


700 


710 


BY 


SIMILARITY. 


FT 


DISULFID 


705 


719 


BY 


SIMILARITY. 


FT 


DISULFID 


721 


730 


BY 


SIMILARITY. 


FT 


DISULFID 


739 


750 


BY 


SIMILARITY. 


FT 


DISULFID 


744 


759 


BY 


SIMILARITY, 


FT 


DISULFID 


761 


770 


BY 


SIMILARITY. 


FT 


DISULFID 


776 


787 


BY 


SIMILARITY. 


FT 


DISULFID 


781 


797 


BY 


SIMILARITY, 


pi 


DISULFID 


799 


808 


BY 


SIMILARITY , 


FT 


DlbULrlD 


815 


827 


BY 


SIMILARITY, 


pi 


DISULFID 


821 


836 


BY 


SIMILARITY. 




DISULFID 


838 


847 


BY 


SIMILARITY . 


FT 

FT 


HTOrTT PTfi 

DlbULf ID 


854 


865 


BY 


SIMILARITY. 


FT 


DISULFID 


859 


874 


BY 


SIMILARITY. 




DIbULr ID 


876 


885 


BY 


SIMILARITY. 


FT 


nTcrcrPTfi 

UiOULt 11) 




n?? 


BY 


SIMILARITY. 




HTCrTT PTH 
UlbULr ID 


001 

897 


911 


BY 


SIMILARITY . 


FT 


nTCrnPTn 
uisuiir iu 




922 


BY 


SIMILARITY , 


PT 


HTCrTT PTt\ 

DlbULrlD 


929 


940 


BY 


SIMILARITY . 


FT 


ATOnT PTfi 

DlbULrlD 


934 


949 


BY 


SIMILARITY . 


FT 


DISULFID 


951 


960 


BY 


SIMILARITY, 


FT 


DISULFID 


967 


978 


BY 


SIMILARITY, 


FT 


nT5nr,FTn 

UlOULr m 


972 


987 




CTUTT RDTTlV 


FT 


DISULFID 


989 


998 


BY 


SIMILARITY, 


FT 


DISULFID 


1005 


1016 


BY 


SIMILARITY, 


FT 


DISULFID 


1010 


1023 


BY 


SIMILARITY. 


FT 


DISULFID 


1025 


1034 


BY 


SIMILARITY, 


FT 


DISULFID 


1041 


1062 


BY 


SIMILARITY. 


FT 


DISULFID 


1056 


1071 


BY 


SIMILARITY, 


FT 


DISULFID 


1073 


1082 


BY 


SIMILARITY. 


FT 


DISULFID 


1089 


1100 


BY 


SIMILARITY. 


FT 


DISULFID 


1094 


1109 


BY 


SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

A DISULFID 

■ DISULFID 

TT DISULFID 



1111 1120 

1127 1138 

1132 1147 

1149 1158 

1165 1183 

1177 1192 

1194 1203 

1210 1223 

1215 1233 

1235 1244 

1251 1262 

1256 1276 

1278 1287 

1294 1305 

1299 1314 

1316 1325 

1340 1351 

1345 1362 



BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 



Note: remainder of annotations omitted. 

Query Match 24.8%; Score 288; DB 1; Length 2318; 

Best Local Similarity 41.9%; Pred. No. 8 .01e-42; 

Matches 39; Conservative 16; Mismatches 35; Indels 3; Gaps 2; 

Db 456 CICMAGFTGTYCEVDIDECQSSPCVNGGVCKDRVNGFSCTCPSGFSGSMCQLD-V-DEC 512 

I II hll I : 1:1 I II: I I l|:::| I |:|| :|:: | 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 513 ASTPCRNGAKCVDQPDGYECRCAEGFEGTLCER 545 

:| hllhllll hi II I II: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEK 101 



RESULT 10 

ID CRB_DR0ME STANDARD; PRT; 2139 AA. 
AC P10040; 

DT 01-MAR-1989 (REL, 10, CREATED) 

DT 01-MAY-1991 (REL, 18, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CRUMBS PROTEIN PRECURSOR (95F), 

GN CRB, 

OS DROSOPHILA MELANOGASTER (FRUIT FLY), 

•EDKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
RN [1] 

RP SEQUENCE FROM N.A, 
RC STRAIN-OREGON-R; TISSUE-EMBRYO; 
MEDLINE; 90263104, 
TEPASS U., THERES C, KNUST E.; 

"Crumbs encodes an EGF-like protein expressed on apical membranes of 
Drosophila epithelial cells and required for organization of 
epithelial; 
CELL 61:787-799(1990). 



12] 

RP SEQUENCE OF 1663-1955 FROM N.A. 

RX MEDLINE; 87218537. 

RA KNUST E., DIETRICH U., TEPASS O., BREMER K.A., WEIGEL D,, 

RA VAESSIN H., CAMPOS -ORTEGA J. A.; 

RT "EGF homologous sequences encoded in the genome of Drosophila 

RT melanogaster, and their relation to neurogenic genes/; 

RL EMBO J. 6:761-766(1987). 

CC -!- FUNCTION: MAY PLAY A ROLE IN THE DEVELOPMENT OF EPITHELIA, 

CC POSSIBLY FOR THE ESTABLISHMENT AND/OR MAINTENANCE OF CELL 

CC POLARITY. IT MAY ACT AS A SIGNAL. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC •!- PTM: PHOSPHORYLATED IN THE CYTOPLASMIC DOMAIN (POTENTIAL). 

CC -!- SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS, 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 



CC 


the Eurof 


ean Bioinformatics Institute. There are no restrictions on 


its 


CC 


use by 


non-profit institutions as long as its content is in no 


way 


CC 


modified 


and this statement is not removed. Usage by and for commercial 


CC 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


CC 
CC 
DR 


or send a 


n email to licenseSisb-sib.ch), 




EMBL; M33753; G552087; ALT.SEQ, 




DR 


EMBL; X05144; E1746; -. 






DR 


EMBL; X05144; G929536; -. 






DR 


PIR; B26637; B26637. 






DR 


PIR; A35672; A35672. 






DR 


FLYBASE; FBgn0000368; crb. 






DR 


PROSITE; PS00010; ASXJYDROXYL; 15. 




DR 


PROSITE; PS00022, 


EGF 1; 26. 






DR 


PROSITE; PS01186; EGFJ; 17. 






DR 


PROSITE; PS01187; EGF CA; 15. 




DR 


PFAM; PF00008; EGF; 26. 






DR 


PFAM; PF00054; laminin G; 3. 






DR 


HSSP; P00740; 1IXA. 






KW 


DIFFERENTIATION; 


REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 




KW 


GLYCOPROTEIN; SIGNAL; PHOSPHORYLATION. 




FT 


SIGNAL 


1 


90 






FT 


CHAIN 


91 


2139 


CRUMBS PROTEIN. 




FT 


DOMAIN 


91 


2084 


EXTRACELLULAR (POTENTIAL). 




FT 


TRANSMEM 


2085 


2111 


POTENTIAL. 




FT 


DOMAIN 


2112 


2139 


CYTOPLASMIC (POTENTIAL). 




FT 


DOMAIN 


267 


303 


EGF-LIKE 1, 




FT 


DOMAIN 


306 


343 


EGF-LIKE 2. 




FT 


DOMAIN 


348 


386 


EGF-LIKE 3. 




FT 


DOMAIN 


388 


425 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL), 




FT 


DOMAIN 


427 


463 


EGF-LIKE 5, 




FT 


DOMAIN 


464 


500 


EGF-LIKE 6. 




FT 


DOMAIN 


501 


532 


EGF-LIKE 7. 




FT 


DOMAIN 


545 


581 


EGF-LIKE 8. 




FT 


DOMAIN 


582 


611 


EGF-LIKE 9. 




FT 


DOMAIN 


609 


646 


EGF-LIKE 10. 




FT 


DOMAIN 


648 


685 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL), 




FT 


DOMAIN 


687 


723 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 




FT 


DOMAIN 


725 


761 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 




FT 


DOMAIN 


763 


800 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 




FT 


DOMAIN 


802 


838 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL), 




FT 


DOMAIN 


840 


902 


EGF-LIKE 16. 




FT 


DOMAIN 


904 


940 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 




FT 


DOMAIN 


942 


978 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL), 




FT 


DOMAIN 


980 


1021 


EGF-LIKE 19. 




FT 


DOMAIN 


1207 


1243 


EGF-LIKE 20. 




FT 


DOMAIN 


1481 


1517 


EGF-LIKE 21. 




FT 


DOMAIN 


1759 


1795 


EGF-LIKE 22, 




FT 


DOMAIN 


1797 


1833 


EGF-LIKE .23, CALCIUM-BINDING (POTENTIAL). 




FT 


DOMAIN 


1835 


1871 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 




FT 


DOMAIN 


1874 


1915 


EGF-LIKE 25. 




FT 


DOMAIN 


1915 


1951 


EGF-LIKE 26. 




FT 


DOMAIN 


1953 


1989 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL), 




FT 


DOMAIN 


1991 


2029 


EGF-LIKE 28, CALCIUM-BINDING (POTENTIAL), 




FT 


DOMAIN 


2030 


2070 


EGF-LIKE 29, 




FT 


DISULFID 


271 


282 


BY SIMILARITY. 




FT 


DISULFID 


276 


291 


BY SIMILARITY, 




FT 


DISULFID 


293 


'302 


BY SIMILARITY, 




FT 


DISULFID 


310 


321 


BY SIMILARITY, 




FT 


DISULFID 


315 


331 


BY SIMILARITY. 




FT 


DISULFID 


333 


342 


BY SIMILARITY. 




FT 


DISULFID 


352 


363 


BY SIMILARITY, 




FT 


DISULFID 


357 


374 


BY SIMILARITY. 




FT 


DISULFID 


376 


385 


BY SIMILARITY. 




FT 


DISULFID 


392 


403 


BY SIMILARITY. 




FT 


DISULFID 


397 


412 


BY SIMILARITY, 




FT 


DISULFID 


414 


424 


BY SIMILARITY. 




FT 


DISULFID 


431 


442 


BY SIMILARITY. 




FT 


DISULFID 


436 


451 


BY SIMILARITY. 




FT 


DISULFID 


453 


462 


BY SIMILARITY. 




FT 


DISULFID 


468 


479 


BY SIMILARITY. 




FT 


DISULFID 


473 


488 


BY SIMILARITY. 




FT 


DISULFID 


490 


499 


BY SIMILARITY. 
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FT 


nTcrii pm 

UlOULf 1JJ 


505 


515 


DV CTMTT ADTTV 
DI olMlbAKlli, 


FT CARBOHYD 550 550 POTENTIAL. 


FT 


nTCriT pm 


509 


520 


DV CTMTT ADTTV 

DI alMlLAKlll , 


FT CARBOHYD 565 565 POTENTIAL. 


FT 


U1DULC 1U 


522 


531 


RV CTM7TBDTTV 


FT CARBOHYD 736 736 POTENTIAL. 




nTcm pm 

UloULf ID 






DV CTMTT SOTTV 
bl SlHlLAKllI . 


FT CARBOHYD 746 746 POTENTIAL. 


FT 


nTcni PTn 
uioubr iu 


556 


569 


DV CTMTT ADTTV 

DI MMlLAKllI , 


FT CARBOHYD 860 860 POTENTIAL. 


FT 


nTCriT pm 

UlDULf 1U 


571 


580 


DV CTMTT fiDTTV 
DI OlMlLAlUl I , 


FT CARBOHYD 884 884 POTENTIAL. 


FT 


DISULFID 


586 


597 


RV CTMTT ARTTV 
DI OlnlLAKll I . 


FT CARBOHYD 976 976 POTENTIAL, 


FT 


nTcriT.FTn 

L/iOULf IU 


591 


602 


DV CTUTT fiDTTV 

DI MBlLAKlll , 


FT CARBOHYD 1102 1102 POTENTIAL. 




DTcni PTn 

UloULf IU 


604 




DV CTMTT RDTTV 

DI MMlLAKili. 


FT CARBOHYD 1114 1114 POTENTIAL. 


FT 


Pit cttt, PTn 
uiouLf iu 


613 


624 


DV CTMTT 1DTTV 
DI OiWlLAKli I . 


FT CARBOHYD 1138 1138 POTENTIAL, 




nTcrn pm 


618 




DV CTMTT ftOTTV 

bl oIMlLAKlli. 


FT CARBOHYD 1192 1192 POTENTIAL, 


FT 


DISULFID 


636 


645 


dv CTMTT ARTTV 

DI OlnlL/lRll I , 


FT CARBOHYD 1245 1245 POTENTIAL. 




nT?riT,PTn 

UlDULf 1LI 


652 


664 


DV ctmtt&dttv 
DI oiMlLAKUl. 


FT CARBOHYD 1255 1255 POTENTIAL, 


FT 


DISULFID 


659 


673 


BY SIMILARITY, 


PT riDRrtUVn 17t\A DMiPWPTUT 
11 LAKdUHIU 1J34 PljihNllAlj, 


FT 


DISULFID 


675 


684 


BY SIMILARITY. 


PT f&DDfMJVn nCl tWTPWiPTIVT 

11 UiKrJUnlU ijrjj ljrjj nJIbNIlAL, 


FT 


DISULFID 


691 


702 


BY SIMILARITY, 


pt riDDnuvn i j( >t i 1/1/11 nf\<rpxTipTi\T 
ci tAKoUHIU 1441 1441 fUlbNUAL, 




DISULFID 


696 


711 


oy CTMTT ARTTV 
DI DlnlLAlNll I . 


FT CARBOHYD 1454 1454 POTENTIAL. 


FT 


DISULFID 


713 


722 


BY SIMILARITY, 




FT 


DISULFID 


729 


740 


BY SIMILARITY . 


' ' ' , 

Note: remainder oi annotations omitted, 


FT 


DISULFID 


734 


749 


BY SIMILARITY, 




11 


nTcnr PTn 

UlDULf IU 






BY SIMILARITY, 


Query Match 24.3%; Score 282; DB 1; Length 2139; 


1 


DISULFID 


767 


778 


dv ctmtt ipttv 
di olnlLAKiii , 


Best Local Similarity 39,6%; Pred. No. 1 . 86e- 40 ; 




DISULFID 


772 


787 


DV CTMTT SBTTV 
DI OlMiLAKll I , 


Matches 38; Conservative 19; Mismatches 34; Indels 5; Gaps 5; 




DISULFID 


789 


799 


dv CTMTT ARTTV 
DI OlnlLAKll I , 






nTCriT PTn 
uiouLf iu 


806 




DV CTMTT RDTT'V 

DI ilMlLAKlll, 


Db 1821 CQCQPGFEGQHCEQNIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTCEN 1880 


PT 


nTcm pTn 

UloULf i.U 






DV CTMTT HDTT*V 

DI oiMlLAKIll. 


1:1 h h:| :| hi h hll: 1 1 : II 1 II 1 1 1" ' 


FT 


nTcm PTn 

UloULf 1U 


828 


837 


DV CTMTT RDTT>V 

bl oIMlLAKlli. 


Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEI-PP-A-PR 65 


FT 


DISULFID 




855 


DV CTMTT fiDTTV 
DI OiMlLAKIll , 




FT 


nTcnTFTn 

uiDULf iu 


849 


890 


RV CTMTT SBTTV 
DI OIMlLAKlli, 


Db 1881 EPCRNGSTCQNGFN-ASTGNNFTCTCVPGFEGPLCD 1915 


FT 


DISULFID 


892 


901 


RV CTMTTABTTV 
DI OlrllLnAll I , 


.i . i . 1 1 1 1 i i . i i . 1 1 1 ii i . 
:| :|: llll 1 |: 1 hill II |: 


FT 


DISULFID 


908 


919 


BY SIMILARITY. 


r\v cor , -p^TPr , ^KTr , iKTr , trr\^( , ODDi7r , ^T npnpnnmT? i(\t\ 


FT 


DISULFID 


913 


928 


dv CTMTT ARTTV 

DI DiniLnKll I . 


FT 


DISULFID 


930 


939 


RV CTMTT ADTTV 
DI OlrllLftftll I . 




FT 


DISULFID 


946 


957 


pv CTMTT ARTTV 
DI OlMlLAAli I , 


RESULT 11 


FT 


DISULFID 


952 


966 


DV CTMTT ARTTY 
DI OlrllLhAll I . 


ID NOTC_XENLA STANDARD; PRT; 2524 AA. 


FT 


DTCniiPTn 


968 


977 


DV CTMTT ABTTV 
DI OIMlLAKlli . 


oinoi. 
At P^l/oJ; 


p! 


nTcni PTn 
dioulf iu 


984 


995 


DV CTMTT 1DTTV 
DI OlHlLAKlll. 


DT 01 -MAY-1991 (REL. 18, CREATED) 




nTcru PTn 
UliULf iu 


989 




DV CTUTT ROTTlV 

di oiMUjAKili, 


DT 01-OCT-1996 (REL. 34, LAST SEQUENCE UPDATE) 




nTcnr PTn 
LHoULt IU 




inon 


BY SIMILARITY. 


DT 15 - JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 


FT 


DISULFID 


1211 


1222 


RV CTMTT SBTTV 
DI OlWlLAKli 1 , 


DE NEUROGENIC LOCOS NOTCH PROTEIN HOMOLOG PRECURSOR (XOTCH PROTEIN), 


FT 


DISULFID 


1216 


1231 


dv CTMTT.ARTTY 

DI OlnlLKKll 1 , 


ba XUllrl, 


FT 


DISULFID 


1233 


1242 


BY SIMILARITY . 


Uo AfiNUrUo LALVlo (At Kit AW ILAWHU IKUbJ. 


FT 


DISULFID 


1485 


1496 


RV CTMTT ARTTY 
DI 01P11LAK11 I . 


OC EOKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 


FT 


nTcnr pth 

uioulf iu 


1490 


1505 


RV CTMTT ADTTV 
DI 01M1LAK11 I , 


OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS . 


FT 


DISULFID 


1507 


1516 


dv CJMTT.ARTTY 
DI 01P11LAK11 1 , 


RN [1] 


FT 


nTcnr PTn 

DloULF IU 


1763 


1774 


DV CTMTT SDTTV 
DI MMlLAKllI , 


RP SEQUENCE FROM N.A. 


FT 


DISULFID 


1768 


1783 


BY SIMILARITY, 


BY VPnTTHP, QfUfl^lfl^ 


FT 


DISULFID 


1785 


1794 


RY CTMTT.ARTTY 
di 01I"ULnKll I , 


D& PAPPMftM r UnDRTC U VTMTWPD r , 

KA UUfrMAW L.i HAKKls W., MNlNtK L| 


FT 


DISULFID 


1801 


1812 


BY SIMILARITY . 


RT "Xotch, the Xenopus homolog of Drosophila notch."; 


n 


DISULFID 


1806 


1821 


BY SIMILARITY, 


KL aUtNLtj llJ.llio 1441(l7?U). 




DISULFID 


1823 


1832 


BY SIMILARITY, 


RN [2] 


1 


DISULFID 


1839 


1850 


BY SIMILARITY , 


OD DTOTOTi*lHC IKD.nill 


FT 


DISULFID 


1844 


1859 


BY SIMILARITY , 


Dl IfTHTHPO r . 




nKni.PTn 

DlDULf if 


1861 


1870 


RV CTMTT SDTTV 
01 OIMlLAKlli . 


RL SUBMITTED (JUN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 


FT 


DISULFID 


1878 


1889 


BY SIMILARITY, 


FC -1. CnDfPTTrTTBD Trt^HTTAM, TVUP T UPMRDBHP DDfWPTM 


FT 


DISULFID 


1883 


1903 


RV CTMTT.ADTTV 
DI 01M1LAK11 I . 


CC "!• DEVELOPMENTAL STAGE; EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 


FT 


DISULFID 


1905 


1914 


tjv CTMTT ARTTV 
DI OIMlLAKlli, 


CC -!- SIMILARITY; HIGH, WITH OTHER NOTCH-TYPE PROTEINS, 


FT 


nTcm PTn 

UlDULf IU 


1919 


1930 


RV CTMTT SBTTV 
DI OlnlLAKlI I , 


CC -|- SIMILARITY; CONTAINS 36 EGF"LIKE DOMAINS. 




nTcnr.PTn 

UloULf 1L 


1924 


1939 


RV CTMTT SDTTV 
DI OlMlLAKU 1 , 


CC -|- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 


PT 


nTctiTPTn 
uiouLr iu 




!neo 


RV CTMTT ADTTlV 

DI alMlbAKlli, 


CC ■!■ SIMILARITY: CONTAINS 6 ANK REPEATS. 


PT 


UloULr ID 


1CK7 


1968 


BY SIMILARITY. 


CC 




DISULFID 


1962 


1977 


BY SIMILARITY. 


CC This SWISS-PROT entry is copyright. It is produced through a collaboration 


PT 

FT 


nTcnr nn 
UlbULr IU 


1979 


1988 


BY SIMILARITY, 


CC between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 




UlbULrlU 


1995 


2008 


BY SIMILARITY, 


CC the European Bioinformatics Institute. There are no restrictions on its 


FT 


nTcnr ptr 
UlaULr ID 


2002 


2017 


BY SIMILARITY, 


CC use by non-profit institutions as long as its content is in no way 


FT 


DISULFID 


2019 


2028 


BY SIMILARITY, 


CC mnHlflpH anH fhlQ cfafomGnf 1c nrtf romnvod fTcano h\t and fnr mfrnnorMal 
iiiuuiiicu auu who oUiLcmcUL xo UUL IcJIlUvyu. Uoayt! Uy QllU lUI LUnuilcILJ.a.1 


FT 


CARBOHYD 


37 


37 


POTENTIAL. 


CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


FT 


CARBOHYD 


96 


96 


POTENTIAL. 


CC or send an email to license@isb-sib.ch) . 


FT 


CARBOHYD 


198 


198 


POTENTIAL. 


CC 


FT 


CARBOHYD 


238 


238 


POTENTIAL. 


DR EMBL; M33874; G1364263; -. 


FT 


CARBOHYD 


239 


239 


POTENTIAL. 


DR PIR; A35B44; A35844. 


FT 


CARBOHYD 


336 


336 


POTENTIAL, 


DR PROSITE; PS00010; ASX HYDROXYL; 23. 


FT 


CARBOHYD 


400 


400 


POTENTIAL, 


DR PROSITE; PS00022; EGF.l; 34. 
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DR PROSITE; PS01186; EGFJ; 29. 

DR PROSITE; PS01187; EGF_CA; 21. 

DR PFAM; PF00008; EGF; 36. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P0Q740; 1IXA, 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF -LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


CHAIN 


20 


2524 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG. 


FT 


DOMAIN 


20 


1728 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


1729 


1750 


POTENTIAL. 


FT 


DOMAIN 


1751 


2524 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


57 


EGF- LIKE 1. 


FT 


DOMAIN 


58 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


140 


EGF-LIKE 3. 


i 


DOMAIN 


141 


177 


EGF-LIKE 4, 


■ 


DOMAIN 


179 


215 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


TT 


DOMAIN 


217 


254 


EGF-LIKE 6. 


FT 


DOMAIN 


256 


292 


EGF-LIKE 7, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


294 


332 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


371 


409 


EGF-LIKE 10, 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


489 


525 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


527 


563 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


565 


600 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


602 


638 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


640 


675 


EGF-LIKE 17, 


FT 


DOMAIN 


677 


713 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


715 


750 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


752 


788 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


790 


826 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


828 


866 


EGF-LIKE 22, 


FT 


DOMAIN 


868 


904 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


906 
944 


942 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


980 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


982 


1018 


EGF-LIKE 26. 


FT 


DOMAIN 


1020 


1056 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1058 


1094 


EGF-LIKE 28. 


FT 


DOMAIN 


1096 


1142 


EGF-LIKE 29. 


FT 


DOMAIN 


1144 


1180 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


1182 


1218 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) 


k 


DOMAIN 


1220 


1264 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) 


1 


DOMAIN 


1266 


1304 


EGF-LIKE 33. 


m 


DOMAIN 


1306 


1346 


EGF-LIKE 34. 


FT 


DOMAIN 


1347 


1383 


EGF-LIKE 35. 


FT 


DOMAIN 


1386 


1424 


EGF-LIKE 36. 


FT 


DOMAIN 


1441 


1560 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1441 


1478 


LIN/NOTCH 1, 


FT 


REPEAT 


1479 


1520 


LIN/NOTCH 2, 


FT 


REPEAT 


1521 


1560 


LIN/NOTCH 3, 


FT 


DOMAIN 


1871 


2083 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


22 


35 


BY SIMILARITY. 


FT 


DISULFID 


29 


45 


BY SIMILARITY, 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 


FT 


DISULFID 


62 


74 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY SIMILARITY, 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


FT 


DISULFID 


111 


128 


BY SIMILARITY. 


FT 


DISULFID 


130 


139 


BY SIMILARITY. 


FT 


DISULFID 


145 


156 


BY SIMILARITY. 


FT 


DISULFID 


150 


165 


BY SIMILARITY. 


FT 


DISULFID 


167 


176 


BY SIMILARITY, 


FT 


DISULFID 


183 


194 


BY SIMILARITY. 


FT 


DISULFID 


188 


203 


BY SIMILARITY. 


FT 


DISULFID 


205 


214 


BY SIMILARITY. 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 


FT 


DISULFID 


226 


242 


BY SIMILARITY, 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT 


DISULFID 


260 


271 


BY SIMILARITY, 



FT 


DISULFID 


265 


280 


BY SIMILARITY. 


FT 


DISULFID 


282 


291 


BY SIMILARITY. 


FT 


DISULFID 


298 


311 


BY SIMILARITY. 


FT 


DISULFID 


305 


320 


BY SIMILARITY. 


FT 


DISULFID 


322 


331 


BY SIMILARITY. 


FT 


DISULFID 


338 


349 


BY SIMILARITY. 


FT 


DISULFID 


343 


358 


BY SIMILARITY, 


FT 


DISULFID 


360 


369 


BY SIMILARITY. 


FT 


DISULFID 


375 


386 


BY SIMILARITY. 


FT 


DISULFID 


380 


397 


BY SIMILARITY. 


FT 


DISULFID 


399 


408 


BY SIMILARITY. 


FT' 


DISULFID 


415 


428 


BY SIMILARITY. 


FT 


DISULFID 


422 


437 


BY SIMILARITY. 


FT 


DISULFID 


439 


448 


BY SIMILARITY. 


FT 


DISULFID 


455 


466 


BY SIMILARITY. 


FT 


DISULFID 


460 


475 


BY SIMILARITY. 


FT 


DISULFID 


477 


486 


BY SIMILARITY. 


FT 


DISULFID 


493 


504 


BY SIMILARITY. 


FT 


DISULFID 


498 


513 


BY SIMILARITY. 


FT 


DISULFID 


515 


524 


BY SIMILARITY. 


FT 


DISULFID 


531 


542 


BY SIMILARITY. 


FT 


DISULFID 


536 


551 


BY SIMILARITY. 


FT 


DISULFID 


553 


562 


BY SIMILARITY. 


FT 


DISULFID 


569 


579 


BY SIMILARITY. 


FT 


DISULFID 


574 


588 


BY SIMILARITY. 


FT 


DISULFID 


590 


599 


BY SIMILARITY. 


FT 


DISULFID 


606 


617 


BY SIMILARITY. 


FT 


DISULFID 


611 


626 


BY SIMILARITY. 


FT 


DISULFID 


628 


637 


BY SIMILARITY. 


FT 


DISULFID 


644 


654 


BY SIMILARITY, 


FT 


DISULFID 


649 


663 


BY SIMILARITY, 


FT 


DISULFID 


665 


674 


BY SIMILARITY. 


FT 


DISULFID 


681 


692 


BY SIMILARITY. 


FT 


DISULFID 


686 


701 


BY SIMILARITY. 


FT 


DISULFID 


703 


712 


BY SIMILARITY, 


FT 


DISULFID 


719 


729 


BY SIMILARITY, 


FT 


DISULFID 


724 


738 


BY SIMILARITY. 


FT 


DISULFID 


740 


749 


BY SIMILARITY, 


FT 


DISULFID 


756 


767 


BY SIMILARITY. 


FT 


DISULFID 


761 


776 


BY SIMILARITY. 


FT 


DISULFID 


778 


787 


BY SIMILARITY. 


FT 


DISULFID 


794 


805 


BY SIMILARITY. 


FT 


DISULFID 


799 


814 


BY SIMILARITY. 


FT 


DISULFID 


816 


825 


BY SIMILARITY. 


FT 


DISULFID 


832 


843 


BY SIMILARITY. 


FT 


■ DISULFID 


837 


854 


BY SIMILARITY. 


FT 


DISULFID 


856 


865 


BY SIMILARITY, 


FT 


DISULFID 


872 


883 


BY SIMILARITY. 


FT 


DISULFID 


877 


892 


BY SIMILARITY, 


FT 


DISULFID 


894 


903 


BY SIMILARITY. 


FT 


DISULFID 


910 


921 


BY SIMILARITY. 


FT 


DISULFID 


915 


930 


BY SIMILARITY. 


FT 


DISULFID 


932 


941 


BY SIMILARITY. 


FT 


DISULFID 


986 


997 


BY SIMILARITY, 


FT 


DISULFID 


991 


1006 


BY SIMILARITY, 


FT 


DISULFID 


1008 


1017 


BY SIMILARITY. 


FT 


DISULFID 


1024 


1035 


BY SIMILARITY. 


FT 


DISULFID 


1029 


1044 


BY SIMILARITY, 


FT 


DISULFID 


1046 


1055 


BY SIMILARITY. 


FT 


DISULFID 


1062 


1073 


BY SIMILARITY, 


FT 


DISULFID 


1067 


1082 


BY SIMILARITY, 


FT 


DISULFID 


1084 


1093 


BY SIMILARITY. 


FT 


DISULFID 


1100 


1121 


BY SIMILARITY. 


FT 


DISULFID 


1115 


1130 


BY SIMILARITY. 


FT 


DISULFID 


1132 


1141 


BY SIMILARITY. 


FT 


DISULFID 


1148 


1159 


BY SIMILARITY. 


FT 


DISULFID 


1153 


1168 


BY SIMILARITY. 


FT 


DISULFID 


1170 


1179 


BY SIMILARITY. 


FT 


DISULFID 


1186 


1197 


BY SIMILARITY, 


FT 


DISULFID 


1191 


1206 


BY SIMILARITY. 


FT 


DISULFID 


1208 


1217 


BY SIMILARITY. 


FT 


DISULFID 


1224 


1243 


BY SIMILARITY. 


FT 


DISULFID 


1237 


1252 


BY SIMILARITY. 
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FT 


DISULFID 


1254 


1263 


BY SIMILARITY,- 


FT 


DISULFID 


1270 


1283 


BY SIMILARITY. 


FT 


DISULFID 


1275 


1292 


BY SIMILARITY, 


FT 


DISULFID 


1294 


1303 


BY SIMILARITY. 


FT 


DISULFID 


1310 


1321 


BY SIMILARITY, 


FT 


DISULFID 


1315 


1333 


BY SIMILARITY, 


FT 


DISULFID 


1335 


1344 


BY SIMILARITY, 


FT 


DISULFID 


1351 


1362 


BY SIMILARITY, 


FT 


DISULFID 


1356 


1371 


BY SIMILARITY, 


FT 


DISULFID 


1373 


1382 


BY SIMILARITY, 


FT 


DISULFID 


1390 


1401 


BY SIMILARITY, 


FT 


DISULFID 


1395 


1412 


BY SIMILARITY, 


FT 


DISULFID 


1414 


1423 


BY SIMILARITY, 


FT 


CARBOHYD 


462 


462 


POTENTIAL. 


FT 


CARBOHYD 


887 


887 


POTENTIAL. 



Note; remainder of annotations omitted. 

Query Match 24.3%; Score 282; DB 1; Length 2524; 

Best Local Similarity 35.0*; Pred, No. 1.86e-40; 

Matches 35; Conservative 25; Mismatches 36; Indels 4; Gaps 2; 

•158 PFEIQUCKCPPGFHGATCKQDINECSQNPCKNGGQCINEFGSYRCTCQNRFTGRNCDEP 217 
I: II |: I I :: ::| :: I ||:||::| II I I : ::|: |: | 
Oy 2 PLPVHHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIP 61 

Db 218 YVP-- -CNPSPCLNGGTCRQTDDTSYDCTCLPGFSGQNCE 254 

I I: : I II: I : : I llllhl :|| 
Qy 62 PAPRSSCEGTECQNGANCVDQGSRPV-CQCLPGFGGPECE 100 



RESULT 12 

ID NTC1.RAT STANDARD; PRT; 2531 AA. 
AC Q07008; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR. 

GN NOTCH1. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC RODENT IA; SCIUROGNATHI ; MURIDAE; MURINAE; RATTUS. 
RN [1] 

RP SEQUENCE FROM N.A. 
RC TISSUE=SCHWANN CELL; 
RX MEDLINE; 92111383. 
RA WEINMASTER G., ROBERTS V.J., LEMKE G.; 
RT "A homolog of Drosophila Notch expressed during mammalian 
RT development."; 
A DEVELOPMENT 113:199-205(1991). 

A •!■ FUNCTION: REQUIRED FOR THE CORRECT DIFFERENTIATION OF A NUMBER 
W OF TISSUES. 
CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

•!• DEVELOPMENTAL STAGE: IN THE EMBRYO, HIGHEST LEVELS OCCUR BETWEEN 
DAYS 12 AND 14 AND DECREASE RAPIDLY TO MUCH LOWER LEVELS IN THE 
ADULT. 



HIGH, WITH OTHER NOTCH -TYPE PROTEINS. 
CONTAINS 36 EGF-LIKE DOMAINS. 
CONTAINS 3 LIN/NOTCH REPEATS. 
CONTAINS 6 ANK REPEATS, 



CC 
CC 
CC 

CC •!- SIMILARITY; 

CC •!- SIMILARITY: 

CC -!- SIMILARITY; 

CC -!- SIMILARITY; 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch). 

cc - 

DR EMBL; X57405; G57635; -, 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS00022; EGF 1; 35, 

DR PROSITE; PS01186; EGF 2; 26, 



DR 


PROSITE; 


PS01187; EGF.CA; 21. 




PFAM; PFC 


0008; E 


GF; 35. 




DR 


PFAM; PF00023; ank; 6. 




DR 


PFAM; PF00066; notch; 3. 




DR 


HSSP; P00740; 1IXA. 






DIFFERENTIATION; 


NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 




DOMAIN 


19 


1723 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1724 


1746 


POTENTIAL. 




DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL) . 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2, 


ft 


DOMAIN 


102 


139 


EGF-LIKE 3. 




DOMAIN 


140 


176 


EGF-LIKE 4. 


J 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 




DOMAIN 


218 


255 


EGF-LIKE 6. 




DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 




DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 




DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


829 


867 


EGF-LIKE 22. 


FT 

FT 


DOMAIN 


869 


905 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


907 


943 


EGF-LIKE 24 . 


FT 


DOMAIN 


945 


981 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


983 


1019 


EGF-LIKE 26. 


FT 


DOMAIN 


1021 


1057 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1059 


1095 


EGF-LIKE 28, 


FT 


DOMAIN 


1097 


1143 


EGF-LIKE 29. 


FT 


DOMAIN 


1145 


1181 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN ' 


1183 


1219 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1221 


1265 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1267 


1305 


EGF-LIKE 33. 


FT 


DOMAIN 


1307 


1346 


EGF-LIKE 34. 


JT 


DOMAIN 


1348 


1384 


EGF-LIKE 35, 


FT 


DOMAIN 


1387 


1426 


EGF-LIKE 36, 




DOMAIN 


1449 


1462 


CYS-RICH, 


FT 


DOMAIN 


1865 


2076 


6 X ANK MOTIF REPEATS. 




REPEAT 


1865 


1910 


ANK MOTIF 1. 


FT 


REPEAT 


1912 


1942 


ANK MOTIF 2. 


FT 


REPEAT 


1944 


1975 


ANK MOTIF 3 . 




REPEAT 


1978 


2009 


ANK MOTIF 4 . 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


FT 


REPEAT 


2044 


2076 


ANK MOTIF 6. 


FT 


DISULFID 


24 


37 


BY SIMILARITY. 


FT 


DISULFID 


31 


46 


BY SIMILARITY. 


FT 


DISULFID 


48 


57 


BY SIMILARITY. 


FT 


DISULFID 


63 


74 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY SIMILARITY. 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


FT 


DISULFID 


106 


117 


BY SIMILARITY, 


FT 


DISULFID 


111 


127 


BY SIMILARITY. 


FT 


DISULFID 


129 


138 


BY SIMILARITY. 


FT 


DISULFID 


144 


155 


BY SIMILARITY. 


FT 


DISULFID 


149 


164 


BY SIMILARITY. 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 


FT 


DISULFID 


182 


195 


BY SIMILARITY, 


FT 


DISULFID 


189 


204 


BY SIMILARITY, 


FT 


DISULFID 


206 


215 


BY SIMILARITY, 


FT 


DISULFID 


222 


233 


BY SIMILARITY, 


FT 


DISULFID 


227 


243 


BY SIMILARITY. 
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FT 


DISULFID 


245 


254 


BY SIMILARITY 


FT 


DISULFID 


261 


272 


BY SIMILARITY 


FT 


DISULFID 


266 


281 


BY SIMILARITY 


FT 


DISULFID 


283 


292 


BY SIMILARITY 


FT 


DISULFID 


299 


312 


BY SIMILARITY 


FT 


DISULFID 


306 


321 


BY SIMILARITY 


FT 


DISULFID 


323 


332 


BY SIMILARITY 


FT 


DISULFID 


339 


350 


BY SIMILARITY 


FT 


DISULFID 


344 


359 


BY SIMILARITY 


FT 


DISULFID 


361 


370 


BY SIMILARITY 


FT 


DISULFID 


376 


387 


BY SIMILARITY 


FT 


DISULFID 


381 


398 


BY SIMILARITY 


FT 


DISULFID 


400 


409 


BY SIMILARITY 


FT 


DISULFID 


416 


429 


BY SIMILARITY 


FT 


DISULFID 


423 


438 


BY SIMILARITY 


FT 


DISULFID 


440 


449 


BY SIMILARITY 


i 


DISULFID 


456 


467 


BY SIMILARITY 


w 


DISULFID 


461 


476 


BY SIMILARITY 




DISULFID 


478 


487 


BY SIMILARITY 


FT 


DISULFID 


494 


505 


BY SIMILARITY 


FT 


DISULFID 


499 


514 


BY SIMILARITY 


FT 


DISULFID 


516 


525 


BY SIMILARITY 


FT 


DISULFID 


532 


543 


BY SIMILARITY 


FT 


DISULFID 


537 


552 


BY SIMILARITY 


FT 


DISULFID 


554 


563 


BY SIMILARITY 


FT. 


DISULFID 


570 


580 


BY SIMILARITY 


FT 


DISULFID 


575 


589 


BY SIMILARITY 


FT 


DISULFID 


591 


600 


BY SIMILARITY 


FT 


DISULFID 


607 


618 


BY SIMILARITY 


FT 


DISULFID 


612 


627 


BY SIMILARITY 


FT 


DISULFID 


629 


638 


BY SIMILARITY 


FT 


DISULFID 


645 


655 


BY SIMILARITY 


FT 


DISULFID 


650 


664 


BY SIMILARITY 


FT 


DISULFID 


666 


675 


BY SIMILARITY 


FT 


DISULFID 


682 


693 


BY SIMILARITY 


FT 


DISULFID 


687 


702 


BY SIMILARITY 


FT 


DISULFID 


704 


713 


BY SIMILARITY 


FT 


DISULFID 


720 


730 


BY SIMILARITY. 


FT 


DISULFID 


725 


739 


BY SIMILARITY 


FT 


DISULFID 


741 


750 


BY SIMILARITY, 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


FT 


DISULFID 


762 


777 


BY SIMILARITY. 


FT 


DISULFID 


779 


788 


BY SIMILARITY, 


FT 


DISULFID 


795 


806 


BY SIMILARITY. 


| 


DISULFID 


800 


815 


BY SIMILARITY. 


1 


DISULFID 


817 


826 


BY SIMILARITY. 


Wl 


DISULFID 


833 


844 


BY SIMILARITY. 


FT 


DISULFID 


838 


855 


BY SIMILARITY. 


FT 


DISULFID 


857 


866 


BY SIMILARITY. 


FT 


DISULFID 


873 


884 


BY SIMILARITY. 


FT 


DISULFID 


878 


893 


BY SIMILARITY. 


FT 


DISULFID 


895 


904 


BY SIMILARITY, 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISULFID 


916 


931 


BY SIMILARITY. 


FT 


DISULFID 


933 


942 


BY SIMILARITY. 


FT 


DISULFID 


987 


998 


BY SIMILARITY. 


FT 


DISULFID 


992 


1007 


BY SIMILARITY. 


FT 


DISULFID 


1009 


1018 


BY SIMILARITY, 


FT 


DISULFID 


1025 


1036 


BY SIMILARITY. 


FT 


DISULFID 


1030 


1045 


BY SIMILARITY. 


FT 


DISULFID 


1047 


1056 


BY SIMILARITY, 


FT 


DISULFID 


1063 


1074 


BY SIMILARITY, 


FT 


DISULFID 


1068 


1083 


BY SIMILARITY, 


FT 


DISULFID 


1085 


1094 


BY SIMILARITY, 


FT 


DISULFID 


1101 


1122 


BY SIMILARITY. 


FT 


DISULFID 


1116 


1131 


BY SIMILARITY. 


FT 


DISULFID ■ 


1133 


1142 


BY SIMILARITY. 


FT 


DISULFID 


1149 


1160 


BY SIMILARITY. 


FT 


DISULFID 


1154 


1169 


BY SIMILARITY. 


FT 


DISULFID 


1171 


1180 


BY SIMILARITY. 


FT 


DISULFID 


1187 


1198 


BY SIMILARITY. 


FT 


DISULFID 


1192 


1207 


BY SIMILARITY. 


FT 


DISULFID 


1209 


1218 . 


BY SIMILARITY. 



FT 


DISULFID 


1225 


1244 


BY SIMILARITY. 


FT 


DISULFID 


1238 


1253 


RY QTMTT.APTTV 


FT 


DISULFID 


1255 


1264 


BY SIMILARITY. 


FT 


DISULFID 


1271 


1284 


BY SIMILARITY. 


FT 


DISULFID 


1276 


1293 


BY SIMILARITY, 


FT 


DISULFID 


1295 


1304 


BY SIMILARITY. 


FT 


DISULFID 


1311 


1322 


BY SIMILARITY. 


FT 


DISULFID 


1316 


1334 


BY SIMILARITY, 


FT 


DISULFID 


1336 


1345 


BY SIMILARITY. 


FT 


DISULFID 


1352 


1363 


BY SIMILARITY, 


FT 


DISULFID 


1357 


1372 


BY SIMILARITY. 


FT 


DISULFID 


1374 


1383 


BY SIMILARITY, 


FT 


DISULFID 


1391 


1403 


BY SIMILARITY, 



Note: remainder of annotations omitted, 

Query Match 24,3%; Score 282; DB 1; Length 2531; 

Best Local Similarity 37,0%; Pred, No, 1.86e-40; 

Matches 34; Conservative 21; Mismatches 34; indels 3; Gaps 2; 

Db 438 CQCLQGYTGPRCEIDVNECISNPCQNDATCLDQIGEFQCICMPGYEGVYCEIN-T--DEC 494 

1:1: llll I : -I : III I |:|:: : |:|: II I III : | 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 495 ASSPCLHNGRCVDKINEFLCQCPKGFSGHLCQ 526 

:: I : : III : :||| l|:| I: 
Qy ' 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 13 

ID NTCl.MOUSE STANDARD; PRT; 2531 AA, 

AC QQ1705; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR (MOTCH PROTEIN) . 

GN NOTCH1 OR MOTCH. 

OS MUS MUSCULUS (MOUSE) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE- EMBRYO; 

RX MEDLINE; 93194170. 

RA FRANCO DEL AMO P., GENDRON-MAGUIRE M., SWIATEK P.J., JENKINS N.A., 

RA COPELAND N.G., GRIDLEY T.; 

RT "Cloning, analysis, and chromosomal localization of Notch- 1, a mouse 

rt homolog of Drosophila Notch."; 

RL GENOMICS 15:259-264(1993). 

RN [2] 

RP SEQUENCE OF 1551-2170 FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93048835. 

RA FRANCO DEL AMO P., SMITH D.E., SWIATEK P.J., GENDRON-MAGUIRE M., 

RA GREENSPAN R.J., MCMAHON A, P., GRIDLEY T.; 

RT "Expression pattern of Motch, a mouse homolog of Drosophila Notch, 

RT' suggests an important role in early postimplantation mouse 

RT development."; 

RL DEVELOPMENT 115:737-744(1992), 

CC ■!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -I- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS, 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS, 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS, 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license?isb-sib,ch) . 
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cc 

DR EMBL; Z11886; G288503; -. 

DR MGD; MGI : 97363 ; NOTCH1. 

DR PROSITE; PS00010; ASX HYDROXYL; 22. 

DR PROSITE; PS00022; EGF_1 ; 34. 

DR PROSITE; PS01186; EGF 2; 27. 

DR PROSITE; PS01187; EGF CA; 21. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00Q66; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. , 



FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 


FT 


DOMAIN 


19 


1725 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1726 


1746 


POTENTIAL. 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


24 


1425 


36 X EGF-TYPE REPEATS, 


FT 


DOMAIN 


1449 


1462 


CYS-RICH, 


FT 


DOMAIN 


1445 


1562 


3 X LIN/NOTCH REPEATS, 




REPEAT 


1445 


1480 


LIN/NOTCH 1. 


• 


REPEAT 


1481 


1522 


LIN/NOTCH 2. 




REPEAT 


1523 


1562 


LIN/NOTCH 3. 


FT 


DOMAIN 


1865 


2075 


6 X ANK MOTIF REPEATS, 


FT 


REPEAT 


1865 


1910 


ANK MOTIF 1. 


FT 


REPEAT 


1912 


1942 


ANK MOTIF 2, 


FT 


REPEAT. 


1944 


1975 


ANK MOTIF 3. 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4, 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5, 


FT 


REPEAT 


2044 


2075 


ANK MOTIF 6, 


FT 


CARBOHYD 


888 


888 


POTENTIAL. 


FT 


CARBOHYD 


959 


959 


POTENTIAL. 


FT 


CARBOHYD 


1179 


1179 


POTENTIAL. 


FT 


CARBOHYD 


1241 


1241 


POTENTIAL. 


FT 


CARBOHYD 


1489 


1489 


POTENTIAL. 


FT 


CARBOHYD 


1587 


1587 


POTENTIAL. 


SO 


SEQUENCE 


2531 


AA; 271312 MW; AD71189B CRC32; 


Query Match 




24.11; 


Score 279; DB 1; Length 2531; 



Best Local Similarity 41.3%; Pred. No. 8.95e-40; 
Matches 38; Conservative 17; Mismatches 34; Indels 3; Gaps 2; 

Db 931 CDCLPGFQGAFCEEDINECASNPCQNGANCTDCVDSYTCTCPVGFNGIHCE-NNTP-DC 987 

hi: h! I h ::l : Mill I I MM I Ml I :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 988 TESSCFNGGTCVDGINSFTCLCPPGFTGSYCQ 1019 
: I II: III : II III |: |: 
69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 14 

ID NTC1 HUMAN STANDARD; PRT; 2444 AA, 

AC P46531; 

DT 01-NOV-1995 (REL, 32, CREATED) 

DT Ql-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1 PRECURSOR ( TRANSLOCATION - 

DE ASSOCIATED NOTCH PROTEIN TAN-1) (FRAGMENT). 

GN NOTCH1 OR TAN1. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91347367, 

RA ELLISEN L.W., BIRD J., WEST D.C., SORENG A.L., REYNOLDS T.C., 

RA SMITH S.D., SKLAR J.; 

RT "TAN-1, the human homolog of the Drosophila notch gene, is broken by 

RT chromosomal translocations in T lymphoblastic neoplasms."; 

RL CELL 66:649-661(1991). 

CC -!- FUNCTION: MAY BE IMPORTANT FOR NORMAL LYMPHOCYTE FUNCTION, IN 



CC ALTERED FORM, MAY CONTRIBUTE TO TRANSFORMATION OR PROGRESSION 

CC IN SOME T-CELL NEOPLASMS. 

CC -!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC ■!- TISSUE SPECIFICITY: IN FETAL TISSUES MOST ABUNDANT IN SPLEEN, 

CC BRAIN STEM AND LUNG. ALSO PRESENT IN MOST ADULT TISSUES WHERE IT 

CC IS FOUND MAINLY IN LYMPHOID TISSUES. 

CC -!• SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS, 

CC •!■ SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license?isb-sib.ch). 

cc 

DR EMBL; M73980; G338675; -. 

DR KIM; 190198; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 20. 

DR PROSITE; PS00022; EGF 1; 34. 

DR PROSITE; PS01186; EGF_2; 26. 

DR PROSITE; PS01187; EGF.CA; 18. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


18 


POTENTIAL. 




CHAIN 


19 


>24 44 


urnufviirMTr 1 t/yvic Mrwru boaitimm uauata^ i 
ntiUKUUllMH, UJLUa WUl^H rKUlMN HUMUbUu J. . 


FT 


DOMAIN 


19 


1736 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1737 


1757 


POTENTIAL. 


FT 


DOMAIN 


1758 


>2444 


CYTOPLASMIC (POTENTIAL) . 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3. 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN' 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20, 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


829 


868 


EGF-LIKE 22. 


FT 


DOMAIN 


870 


906 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


908 


944 


EGF-LIKE 24. 


FT 


DOMAIN 


946 


982 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 26. 


FT 


DOMAIN 


1022 


1058 


EGF-LIKE 27 , 


FT 


DOMAIN 


1060 


1096 


EGF-LIKE 28. 


FT 


DOMAIN 


1098 


1144 


EGF-LIKE 29. 


FT' 


DOMAIN 


1146 


1182 


EGF-LIKE 30, 


FT 


DOMAIN 


1184 


1220 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1222 


1266 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


1268 


1306 


EGF-LIKE 33. 


FT 


DOMAIN 


1308 


1347 


EGF-LIKE 34. 


FT 


DOMAIN 


1349 


1385 


EGF-LIKE 35. 


FT 


DOMAIN 


1388 


1427 


EGF-LIKE 36, 


FT 


DOMAIN 


1446 


1563 


3 X LIN/NOTCH REPEATS. 
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Page 



FT REPEAT 

FT REPEAT 

FT REPEAT 

FT DOMAIN 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

IT DOMAIN 

A DOMAIN 

H DOMAIN 

TT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

•DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 



1446 1481 

1482 1523 

1524 1563 

1876 2087 

1876 1921 

1923 1954 

1956 1987 

1990 2021 

2023 2054 

2056 2087 

1576 1579 

1662 1665 

1729 1732 

1741 1744 

1902 1905 

2260 2263 

2404 2407 

2411 2418 



106 
111 
129 
144 
149 
166 
182 
189 
206 
222 
227 
245 
261 
266 
283 
299 
306 
323 
339 
344 
361 
376 
381 
400 
416 
423 
440 
456 
461 
478 
494 
499 
516 
532 
537 
554 
570 
575 
591 
607 
612 
629 
545 
650 
666 
682 
687 
704 
720 



37 
46 
57 
74 
87 
98 
117 
127 
138 
155 
164 
175 
195 
204 
215 
233 
243 
254 
272 
281 
292 
312 
321 
332 
■350 
359 
370 
387 
398 
409 
429 
438 
449 
467 
476 
487 
505 
514 
525 
543 
552 
563 
580 
589 
600 
618 
627 
638 
655 
664 
675 
693 
702 
713 
730 



LIN/NOTCH 1. 

LIN/NOTCH 2. 

LIN/NOTCH 3. 

6 X ANK MOTIF REPEATS. 

ANK MOTIF 1. 

ANK MOTIF 2, 

ANK MOTIF 3. 

ANK MOTIF 4. 

ANK MOTIF 5, 

ANK MOTIF 6. 

POLY-VAL. 

POLY -ARC 

POLY -PRO. 

POLY-ALA. 

POLY-GLU. 

POLY-GLY. 

POLY-GLN. 

POLY -PRO. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY, 

BY SIMILARITY, 

BY SIMILARITY. 

BY SIMILARITY. 

BY SIMILARITY. 



FT 


DISULFID 


725 


739 


BY SIMILARITY. 


FT 


DISULFID 


741 


750 


BY SIMILARITY, 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


FT 


DISULFID 


762 


777 


BY SIMILARITY, 


FT 


DISULFID 


779 


788 


BY SIMILARITY, 


FT 


DISULFID 


795 


806 


BY SIMILARITY, 


FT 


DISULFID 


800 


815 


BY SIMILARITY. 


FT 


DISULFID 


817 


826 


BY SIMILARITY. 


FT 


DISULFID 


833 


844 


BY SIMILARITY. 


FT 


DISULFID 


838 


855 


BY SIMILARITY. 


FT 


DISULFID 


857 


867 


BY SIMILARITY, 


FT 


DISULFID 


874 


885 


BY SIMILARITY. 


FT 


DISULFID 


879 


894 


BY SIMILARITY. 


FT 


DISULFID 


896 


905 


BY SIMILARITY. 


FT 


DISULFID 


912 


923 


BY SIMILARITY. 


FT 


DISULFID 


917 


932 


BY SIMILARITY. 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DISULFID 


988 


999 


BY SIMILARITY. 


FT 


DISULFID 


993 


1008 


BY SIMILARITY. 


FT 


DISULFID 


1010 


1019 


BY SIMILARITY, 


FT 


DISULFID 


1026 


1037 


BY SIMILARITY. 


FT 


DISULFID 


1031 


1046 


BY SIMILARITY, 


FT 


DISULFID 


1048 


1057 


BY SIMILARITY. 


FT 


DISULFID 


1064 


1075 


BY SIMILARITY, 


FT 


DISULFID 


1069 


1084 


BY SIMILARITY. 


FT 


DISULFID 


1086 


1095 


BY SIMILARITY, 


FT 


DISULFID 


1102 


1123 


BY SIMILARITY. 


FT 


DISULFID 


1117 


1132 


BY SIMILARITY. 


FT 


DISULFID 


1134 


1143 


BY SIMILARITY. 


FT 


DISULFID 


1150 


1161 


BY SIMILARITY. 


FT 


DISULFID 


1155 


1170 


BY SIMILARITY, 


FT 


DISULFID 


1172 


1181 


BY SIMILARITY. 


FT 


DISULFID 


1188 


1199 


BY SIMILARITY. 


FT 


DISULFID 


1193 


1208 


BY SIMILARITY. 



Note: remainder of annotations omitted. 

Query Match 24.01; Score 278; DB 1; Length 2444; 

Best Local Similarity 39.6*; Pred. No. 1. 51e-39; 

Matches 38; Conservative 19; Mismatches 36; Indels 3; Gaps 

Db 932 CDCLPGFRGTFCEEDINECASDPCRNGANCTDCVDSYTCTCPAGFSGIHCE-NNTP--DC 9 

hi: I: I II: ::! hill I I 1 : 1 1 : 1 I hll II :| I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 6 

Db 989 TESSCFNGGTCVDGINSFTCLCPPGFTGSYCQHVVN 1024 

: I II: III : II III h h ::: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 15 

ID NTC4JOUSE STANDARD; PRT; 1964 AA. 

AC P31695; Q62389; 

DT 01-JUL-1993 (REL. 26, CREATED) 

DT 01-NOV-1997 (REL, 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4 PRECURSOR (TRANSFORMING 

DE PROTEIN INT-3). 

GN NOTCH4 OR INT3 OR INT-3. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 92194507. 

RA ROBBINS J., BLONDEL B.J., GALLAHAN D., CALLAHAN R. ; 

RT "Mouse mammary tumor gene int-3: a member of the notch gene family 

RT transforms mammary epithelial cells."; 

RL J, VIROL. 66:2594-2599(1992). 

RN [2] 

RP REVISIONS, SEQUENCE FROM N.A. 

RA CALLAHAN R,; 

RL SUBMITTED (NOV-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 



Tue Jun 1 10:15:55 1999 



US-09-191-1 



■647-lO.rsp 
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RN 
RP 


[3] 








FT 


REPEAT 


1209 


1242 




SEQUENCE FROM N.A. 




FT 


REPEAT 


1243 


1282 




RC 


TISSUE-LUNG, AND TESTIS; 




FT 


DOMAIN 


1572 


1785 


fi X AUK MflTTF RFPFATC 


RX 


MEDLINE; 


96281668 






FT 


REPEAT 


1572 


1603 


ANK MOTIF 1. 


RA 


UYTTENDAELE H,, MARAZZI G., 


WO 6., YAN Q., SASSOON D, ( KITAJEWSKI J,; 


FT 


REPEAT 


1622 


1653 


ANK MOTIF 2. 


RT 


"Notch4/int-3, a mammary proto-oncogene, is an endothelial 


FT 


REPEAT 


1654 


1685 


ANK MOTIF 3. 


RT 


cell-specific mammalian Notch gene."; 


FT 


REPEAT 


1688 


1719 


Aire vdttf i 

fum PIUL If 4 . 


RL 


DEVELOPMENT 122:2251-2259(1996). 




REPEAT 


1721 


1752 


t\n\ nui ir j . 


CC 


-!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 


FT 


REPEAT 


1754 


1785 


Aire VflTTF fi 


CC 


•!- DISEASE; ACTIVATED IBT-3 TRANSFORMS MAMMARY EPITHELIAL CELLS, 


FT 


DISULFID 


25 


38 


BY CTMTT.ARTTY 
DI OlPUbnftll 1 . 


CC 


-!- SIMILARITY; CONTAINS 29 EGF-LIRE DOMAINS . 


FT 


DISULFID 


32 


48 


RV CTMTT ARTTV 
01 oimiLnAll 1 . 


CC 


-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 


FT 


DISULFID 


50 


59 


OV CTMTT.ARTTY 
Dl OiniLnftll I . 


CC 


-!• SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 


FT 


DISULFID 


65 


77 


BY SIMILARITY. 


CC 


-!- SIMILARITY: CONTAINS 6 ANK REPEATS, 


FT 


DISULFID 


71 


100 


BY SIMILARITY. 


CC 












DISULFID 


102 


111 


RY CTMTT.ARTTY 
oi oinibniuil, 


CC 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


FT 


DISULFID 


119 


130 


RY CTMTT ARTTY 
ol jlNlLnnll I , 


CC 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation - 




DISULFID 


124 


140 


RY CTMTT ARTTV 
01 JlftlbnAll 1 . 


CC 


the European Bioinformatics Institute. There are no restrictions on its 


FT 


DISULFID 


142 


151 


RY CTMTT.ARTTY 
oi dii"ULni\ll l . 


CC 


use by 


non-profit institutions as long as its content is in no way 




blOUbf lb 


157 


168 


RV QTWTT ARTTV 
DI SlMlLnKll I , 


CC 


modified and this statement is not removed. Usage by and for commercial 


FT 


DISULFID 


162 


177 


BY SIMILARITY. 


CC 


entities requires 


a license 


agreement (See http://www.isb-sib.ch/announce/ 




DISULFID 


179 


188 


RY CTMTT.ARTTV 

01 OlMlLrtlUil, 


■ 


or send 


n email to license@isb-sib.ch) . 


FT 


DISULFID 


195 


208 


RV CTMTT ARTTV 


IF 












DISULFID 


202 


217 


RV CTMTT ARTTV 
01 OlfllbnAll I . 




EMBL; M80456; G1714084; -. 




FT 


DISULFID 


219 


228 


RY CTMTT.ARTTY 
01 dinibnAll 1 . 


DR 


EMBL; U43691; G1401160; -. 




FT 


DISULFID 


235 


246 


RY CTMTT.ARTTY 
oi omiLnftll I . 


DR 


PIR; A38072; TVMVT3. 






DISULFID 


240 


259 


RY CTMTT ARTTV 
OI OlMlLnKllI . 


DR 


MGD; MGI; 107471; NOTCH4. 




FT 


nTcriT.FTn 

blDUbc lb 


261 


270 


OV CTMTT ADTTV 
DI OlMlLnKllI . 


DR 


PROSITE; PS00010; ASXJYDROXYL; 11, 


FT 


nTCnr.FTn 

UldUbc lb 


235 


246 


RV CTMTTM5TTV 
DI SlWlLhKllI . 


DR 


PROSITE; PS00022; 


EGF.l; 28. 




FT' 


bloUbr lb 


240 


259 


BV CTMTT ADTTV 
SI SlMlbAtUil, 


DR 


PROSITE; PS01186; EGFJ; 21. 






nTCrTT.FTn 

UljUbf lb 


261 


270 


RV CTMTT ADTTV 
DI olMlLAKllI. 


DR 


PROSITE; PS01187; EGF CA; 9. 




FT 


nTC[irt>Tn 
bluUbf LU 


277 


288 


RV CTMTT &DTTV 

DI ilMlLAKlll. 


DR 


PFAM; PF00008; EGF; 26. 






UlSUbf LU 






DV CTMTT ADT1*V 

DI oIMILAKIll, 


DR 


PFAM; PF00023; ank; 6. 




FT 


blDUbl LU 


299 


308 


DV OTUTT 1DTTV 

DI islMlLAKll I . 


DR 


PFAM; PFQ0066; notch; 2. 






UlSUbl LU 






OV CTUTf&DTTV 

oi oiMlLAKIli . 


DR 


HSSP; P00740; 1IXA. 




FT 


TiTcnT.FTn 

bloUbr LU 


323 


338 


OV CTMTT 1DT>PV 

DI DlNlLAnlll. 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-.LIKE DOMAIN; TRANSMEMBRANE; 


FT 


nTcnr.FTn 


340 


349 


RV CTMTT ADTTV 


KW 


GLYCOPROTEIN; PROTO-ONCOGENE; ANK REPEAT; SIGNAL. 




blDUbf LU 


356 


367 


RV CTMTT ADTTV 
01 olMlliAKll 1 . 


FT 


SIGNAL 


1 


20 


POTENTIAL. 


FT 


nTcriT.FTn 

blOUbr LU 


361 


376 


RV CTMTT ADTTV 

oi oiMlLAKlii, 


FT 


CHAIN 


21 


1964 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4. 


FT 


lllOUbl lb 






BY SIMILARITY, 


FT 


DOMAIN 


21 


1443 


EXTRACELLULAR (POTENTIAL). 


FT 


uioubr lu 


393 


AfU 


DV CTMTT ADTTV 
01 OlMlliniUn , 


FT 


TRANSMEM 


1444 


1464 


POTENTIAL. 




uiouur lu 


398 


AIR 


RV CTMTT ADTTV 
DI OlMlLiAlUlI, 


FT 


DOMAIN 


1465 


1964 


CYTOPLASMIC (POTENTIAL), 


FT 


UlSUbf ID 






BY SIMILARITY, 


FT 


DOMAIN 


21 


60 


EGF -LIKE 1. 




uioUbr lu 


433 


449 


BY SIMILARITY, 


FT 


DOMAIN 


61 


112 


EGF -LIKE 2. 


FT 


bioubr lu 


443 


458 


RV CTMTT ADTTV 

oi oinibAKii I , 


FT 


DOMAIN 


115 


152 


EGF -LIKE 3. 


FT 


DISULFID 


460 


469 


RV CTMTT ADTTV 
DI OlKlbnKllI. 


FT 


DOMAIN 


153 


189 


EGF-LIKE 4. 




UloUbf lb 


476 


487 


DV CTMTT ADTTV 

oi MMlLAKlli , 


FT 


DOMAIN 


191 


229 


EGF-LIKE 5,. CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


481 


496 


RV CTMTT ADTTV 
OI OiniLftKll I , 


FT 


DOMAIN 


231 


271 


EGF-LIKE 6. 


FT 


DISULFID 


498 


507 


OV CTMTT ADTTV 
DI OlMlLAKllI, 


FT 


DOMAIN 


273 


309 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL) , 


FT 


DISULFID 


514 


525 


RV CTMTT ARTTV 
OI Olftlbnftll I . 




DOMAIN 


311 


350 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


519 


534 


dv CTMTT.ARTTY 
di olmbnlUi I . 


m 


DOMAIN 


352 


388 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


nTcnr.PTn 

uiovbr lb 


536 


545 


RY CTMTT ARTTV 
OI Olnlbtttull, 


w 


DOMAIN 


389 


427 


EGF-LIKE 10. 


FT 


DISULFID 


552 


563 


BY SIMILARITY. 


FT 


DOMAIN . 


429 


470 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


557 


572 


RY CTMTT.ARTTY 

DL dlMlJjftAll I . 


FT 


DOMAIN 


472 


508 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


574 


583 


RV CTMTT ARTTV 
ol OlrllLnAll I . 


FT 


DOMAIN 


510 


546 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) , 


FT 


DISULFID 


590 


601 


DV CTMTT 1DTTV 
OI OJ.W1LM11 1 , 


FT 


DOMAIN 


548 


584 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 


FT 


nTcnrFTn 

ULOVuS LU 


595 


610 


DV CTMTT ADTTV 
DI DlMlJjAKlll, 


FT 


DOMAIN 


586 


622 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


uioubr ib 


612 


621 


DV CTMTT ARTTV 
DI SJ.M1LAK11 1 , 


FT 


DOMAIN 


622 


656 


EGF-LIKE 16. 


FT 


DTCnrFTn 


626 


637 


DV CTMTT &DTTV 
DI OiniLAKllI. 


FT 


DOMAIN 


658 


686 


EGF-LIKE 17, 


FT 


DISULFID 


631 


646 


DV CTMTT ARTTV 
OI Dlnlbni\ll I , 


FT 


DOMAIN 


688 


724 


EGF-LIKE 18. 


FT 


LflOUbf lb 


648 


655 


DV CTMTT ARTTV 
DI OlHlJjnKJ.n , 


FT 


DOMAIN 


726 


762 


EGF-LIKE 19. 




bloUbr lb 




ci? 


DV CTMTT ADTTV 

oi sIMILAKIIl, 


FT 


DOMAIN 


764 


800 


EGF-LIKE 20. 


pr 




004 


674 


BY SIMILARITY, 


FT 


DOMAIN 


803 


839 


EGF-LIKE 21. 


FT 


riTCnr fth 
bloUbr lb 






BY SIMILARITY , 


FT 


DOMAIN 


841 


877 


EGF-LIKE 22. 


FT 


DISULFID 


692 


703 


BY SIMILARITY, 


FT 


DOMAIN 


878 


924 


EGF-LIKE 23. 


FT 


DISULFID 


697 


712 


BY SIMILARITY. 


FT 


DOMAIN 


926 


962 


EGF-LIKE 24. 


FT 


DISULFID 


714 


723 


BY SIMILARITY. 


FT 


DOMAIN 


964 


1000 


EGF-LIKE 25. 


FT 


DISULFID 


730 


741 


BY SIMILARITY, 


FT 


DOMAIN 


1002 


1040 


EGF-LIKE 26. 


FT 


DISULFID 


735 


750 


BY SIMILARITY. 


FT 


DOMAIN 


1042 


1081 


EGF-LIKE 27. 


FT 


DISULFID 


752 


761 


BY SIMILARITY. 


FT 


DOMAIN 


1083 


1122 


EGF-LIKE 28. 


FT 


DISULFID 


768 


779 


BY SIMILARITY. 


FT 


DOMAIN 


1126 


1167 


EGF-LIKE 29. 


FT 


DISULFID 


773 


788 


BY SIMILARITY. 


FT 


DOMAIN 


1168 


1282 


3 X LIN/NOTCH REPEATS. 


FT 


DISULFID 


790 


799 


BY SIMILARITY. 


FT 


REPEAT 


1168 


1208 


LIN/NOTCH 1. 


FT 


DISULFID 


807 


818 


BY SIMILARITY, 
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FT 


DISULFID 


812 


827 


BY SIMILARITY. 


FT 


DISULFID 


829 


838 


BY SIMILARITY. 


FT 


DISULFID 


845 


856 


BY SIMILARITY. 


FT 


DISULFID 


850 


865 


BY SIMILARITY. 


FT 


DISULFID 


867 


876 


BY SIMILARITY. 


FT 


DISULFID 


882 


903 


BY SIMILARITY. 


FT 


DISULFID 


897 


912 


BY SIMILARITY. 


FT 


DISULFID 


914 


923 


BY SIMILARITY. 


FT 


DISULFID 


930 


941 


BY SIMILARITY. 


FT 


DISULFID 


935 


950 


BY SIMILARITY. 


FT 


DISULFID 


952 


961 


BY SIMILARITY, 


FT 


DISULFID 


968 


979 


BY SIMILARITY. 


FT 


DISULFID 


973 


988 


BY SIMILARITY . 


FT 


DISULFID 


990 


999 


BY SIMILARITY. 


FT 


DISULFID 


1006 


1019 


BY SIMILARITY. 


FT 


DISULFID 


1011 


1028 


BY SIMILARITY. 


| 


DISULFID 


1030 


1039 


BY SIMILARITY, 


■ 


DISULFID 


1046 


1057 


BY SIMILARITY, 


m 


DISULFID 


1051 


1069 


BY SIMILARITY. 


FT 


DISULFID 


1071 


1080 


BY SIMILARITY, 


FT 


DISULFID 


1087 


1098 


BY SIMILARITY, 


FT 


DISULFID 


1092 


1110 


BY SIMILARITY. 


FT 


DISULFID 


1112 


1121 


BY SIMILARITY, 


FT 


DISULFID 


1130 


1142 


BY SIMILARITY. 


FT 


DISULFID 


1136 


1155 


BY SIMILARITY. 


FT 


DISULFID 


1157 


1166 


BY SIMILARITY. 


FT 


CARBOHYD 


711 


711 


POTENTIAL. 


FT 


CARBOHYD 


960 


960 


POTENTIAL. 


FT 


CARBOHYD 


1139 


1139 


POTENTIAL. 


FT 


CONFLICT 


43 


43 


Q -> R (IN REF. 3). 


FT 


CONFLICT 


298 


298 


L ■> P (IN REF. 3). 


FT 


CONFLICT 


884 


884 


M •> K (IN REF. 3). 



Note: remainder of annotations omitted. 



Query Match 23.4%; Score 272; DB 1; Length 1964; 

Best Local Similarity 38.7%; Pred. No, 3.45e-38; 

Matches 36; Conservative 18; Mismatches 36; Indels 3; Gaps 2; 

Db 458 CLCLPGYTGSRCEADHNECLSQPCHPGSTCLDLLATFHCLCPPGLEGRLCEVE-V-NEC 514 

I h Mil I ::::| : |: h |:| : :: III' I |:|||: : I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 



515 TSNPCLNQAACHDLLNGFQCLCLPGFTGARCEK 547 

: I I I I I : I Mill I: III 
69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEK 101 



Search completed: Fri May 28 09:20:39 1999 
Job time : 22 sees. 
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^rch_pp 

Run on: 



Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

• protein database search, using Smith-Waterman algorithm 

Fri May 28 09:20:58 1999; MasPar time 12.73 Seconds 

660.074 Million cell updates/sec 

Tabular output not generated. 

Title: MJS-09-191-647-10 

Description: (1-154) from US09191647 .pep 

Perfect Score: 1160 

Sequence: 1 DPLPVHHRCECMLGYTGDNC EDNGILLYNGDNDHIAVELY 154 

Scoring table: PAM 150 
Gap 11 



179066 seqs, 54579741 residues 

Minimum Match 0% 

Listing first 45 summaries 

sptrembl9 

l:sp_archea 2:sp_bacteria 3:sp_fungi 4:sp_human 
5:sp_invertebrate 6:spjiammal 7:sp_mhc 8:sp_organelle 
9:sp_phage 10:sp.plant ll:sp_rodent 12:sp_unclassified 
13:sp_vertebrate' 14 : sp_virus 



Post-processing: 



Statistics; 



a 39.710; Variance 79,104; 



e 0.502 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



I 

Query 



SUMMARIES 



NO. 


Score 


Match Length 


DB 


ID 


Description 


Pred, No. 


1 


1058 


91.2 


1531 


11 


088279 


MEGF4. 


l,61e 


191 


2 


625 


53.9 


1523 


11 


088280 


MEGF5. 


1.96e 


101 


3 


611 


52.7 


739 


4 


075094 


MEGF5 (FRAGMENT), 


1.44e 


98 


4 


448 


38.6 


530 


5 


Q24526 


SLIT LOCUS ENCODING A 


1.42e 


65 


5 


443 


38,2 


601 


5 


Q20204 


F40E10.4 PROTEIN (FRAG 


1.42e 


64 


6 


333 


28.7 


721 


13 


Q91902 


X-DELTA-1, 


6.30e 


43 


7 


326 


28,1 


529 


5 


Q25058 


FIBROPELLIN IA (FRAGME 


1.42e 


41 


8 


325 


28,0 


728 


13 


Q90656 


TRANSMEMBRANE PROTEIN 


2,21e 


41 


9 


320 


27.6 


406 


5 


Q25059 


FIBROPELLIN III (FRAGM 


2,03e 


40 


10 


313 


27.0 


717 


13 


P87357 


DELTAD TRANSMEMBRANE P 


4.51e 


39 


11 


313 


27.0 


1722 


5 


Q19350 


SIMILAR TO EGF-LIKE RE 


4,51e 


39 


12 


308 


26.6 


802 


13 


057462 


DELTAA, 


4,10e 


38 


13 


300 


25.9 


2408 


4 


Q92566 


MYELOBLAST KIAA0279 (F 


1.39e 


36 


14 


299 


25.8 


1203 


11 


Q06008 


NOTCH PROTEIN HOMOLOG 


2,15e 


36 


15 


299 


25.8 


2470 


11 


035516 


CELL SURFACE PROTEIN. 


2.15e 


36 


16 


299 


25.8 


2531 


5 


016004 


NOTCH HOMOLOG. 


2.15e 


36 


17 


296 


25.5 


752 


13 


042374 


NOTCH RECEPTOR PROTEIN 


8.02e 


36 


18 


295 


25.4 


2352 


5 


061240 


HRNOTCH PROTEIN, 


1.24e 


35 


19 


294 


25.3 


2653 


5 


Q25253 


NOTCH HOMOLOG SCALLOPE 


1.93e 


35 


20 


292 


25,2 


1218 


4 


014902 


TRANSMEMBRANE PROTEIN 


4.63e 


35 



21 


292 


'25 


2 


1218 4 


Q15816 


TRaM5VPVnPlHP DBfYFFTM 
1 flnliCniiriDKnlNL rlsUlLiJN 




22 


292 


25 


2 


1219 11 


Q63722 


JAGGED PROTEIN. 




23 


292 


25 


2 


1227 4 


P78504 




463e 35 


24 


291 


25 


1 


2447 13 


013149 


NOTCH 2 (FRAGMENT) , 


7 17e- 35 


25 


290 


25 


o 


1193 13 


Q90819 


C-^KR.ATF-1 PRATFTN (PR 


1 He- 3 4 


26 


290 


25 


o 


1218 4 


015122 






27 


285 


24 


5 


1212 13 


042347 


^ StiKKftlCi I (iKAWmiu ] 




28 


284 


24 


5 


3313 11 


088278 






29 


283 


24 


4 


1372 5 


P91526 




] £ nl 


30 


277 


23 


9 


615 13 


057409 




320e 32 


31 


273 


23 


5 


642 13 


P79941 


NATCH IJfiAWn V-nPfTA-9 


1 ,81e-31 


32 


272 


23 


4 


1964 11 


035442 


NOTCH4 . 




33 


270 


23 


3 


156 5 


Q26661 




5 "65e*31 


34 


270 


23 


3 


955 4 


Q99466 


N0TCH4 ( FRAGMENT ) . 


5 . 65e - 3 1 


35 


270 


23 


3 


1999 4 


Q9994Q 


NOTCH4 . 


5 . 65e~ 3 1 


36 


270 


23 


3 


2003 4 


000306 


NOTCH4. 


6.65e-31 


37 


265 


22 


8 


263 4 


Q99734 


NOTCH2 TRANSMEMBRANE P 


5.76e-30 


38 


262 


22 


6 


1476 13 


Q90285 


PUTATIVE EXTRACELLULAR 


2 .10e-29 


39 


261 


22 


5 


1687 11 


061204 


NOTCH2-LIKE (EGF REPEA 


3,22e-29 


40 


248 


21 


4 


434 11 


055139 


JAGGED2 PROTEIN (FRAGM 


8.44e-27 


41 


248 


21 


4 


518 11 


070219 


JAGGED 2 (JAGGED 2 PRO 


8.44e-27 


42 


246 


21 


2 


1202 11 


P97607 


JAGGED2 (FRAGMENT), 


1.98e-26 


43 


245 


21 


1 


585 11 


035675 


M-DELTA-LIKE 3 GENE PR 


3.03e-26 


44 


245 


21 


1 


592 11 


088516 


DELTA-LIKE 3 ALTERNATE 


3.03e-26 


45 


244 


21 


0 


1827 5 


Q20535 


SIMILARITY TO EGF-TYPE 


4,63e-26 



RESULT 
ID 
AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 
OC 



1 



PRELIMINARY; PRT; 1531 AA. 



, CREATED) 

, LAST SEQUENCE UPDATE) 
, LAST ANNOTATION UPDATE) 



088279 
088279; 

01-NOV-1998 (TREMBLREL. I 
01-NOV-1998 (TREMBLREL. I 
01-NOV-1998 (TREMBLREL. I 
MEGF4 . 
MEGF4, 

RATTUS NORVEGICUS (RAT), 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 
SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 
RN [1] 
RP SEQUENCE FROM N.A. 
RC STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
RX MEDLINE; 98360089. 

RA NAKAYAMA M,, NAKAJIMA D., NAGASE T., NOMURA N., SEKI N. , OHARA 0.; 

RT "Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif-trap screening,"; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011530; D1033423; 

DR PROSITE; PS01185; CTCK 1; 1, 

DR PROSITE; PS01186; EGF_2; 8. 

DR PROSITE; PS01187; EGF_CA; 2, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

1531 AA; 167497 MW; 5C5EBDF4 CRC32; 



Query Match 91.2%; Score 1058; DB 11; Length 1531; 

Best Local Similarity 95.2%; Pred. No. 1.61e-191; 

140; Conservative 3; Mismatches 4; Indels 0; 



INh llllllll IIIINII lllllllllhlllllll llllllllllllllhl 
RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 



IIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH 



Matches 


Db 


1068 


Qy 


8 


Db 


1128 


Qy 


68 


Db 


1188 


Qy 


128 



lllllllllllllllllllllllllll 
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ID 088280 PRELIMINARY; PRT; 1523 AA, 

AC 088280; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF5 . 

GN MEGF5 . 

OS RATTCJS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN=SPRAGUE-DAWLEY; TISSUE=BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N. , SEKI N., ORARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif -trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011531; D1033424; -. 

DR PROSITE; PS01185; CTCKJ; 1. 

DR PROSITE; PS01186; EGFJ; 7, 

♦PROSITE; PS01187; EGF.CA; 2. 
GLYCOPROTEIN; EGF-LIKE DOMAIN. 
SEQUENCE . 1523 AA; 167767 MW; 2BD845D0 CRC32; 

Query Match 53.9%; Score 625; DB 11; Length 1523; 

Best Local Similarity 54.3*; Pred. No. 1.96e-101; 

Matches 82; Conservative 33;' Mismatches 32; Indels 4; Gaps 3; 

Db 1059 RCECVPGYSGKLCETDNDDCVAHKCRHGAQCVDAVNGYTCICPQGFSGLFCEHPPPMVLL 1118 

Mil: 1 1 : 1 I : III lll::llllll l|:|:|:| :|:|| :|| ||: 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPA-P-- 64 

Db 1119 QTSPCDQYECQNGAQCIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVRP 1178 

I: llllll I: I hi llhll lll|::|||| :|:|:::: :| 
Qy 65 RSS-CEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWP 123 

Db 1179 QANISLQVATDKDNGILLYKGDNDPLALELY 1209 

:IM:|||:| llllllhllll :|:||| 
Qy 124 RANITLQVSTAEDNGILLYNGDNDHIAVELY 154 



RESULT 3 

ID 075094 PRELIMINARY; PRT; 739 AA. 

AC 075094; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF5 (FRAGMENT). 

•MEGF5. 
HOMO SAPIENS (HUMAN). 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D,, NAGASE T., NOMURA N., SEKI N. , OHARA 0.; 

RT "identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif -trap screening/; 

RL GENOMICS 51:27-34(1998), 

DR EMBL; AB011538; D1033429; -. 

DR PROSITE; PS01185; CTCK 1; 1, 

DR PROSITE; PS01186; EGF 2; 7. 

DR PROSITE; PS01187; EGF_CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

FT NON.TER 1 1 

SQ SEQUENCE 739 AA; 80364 MW; DC6BCB63 CRC32; 

Query Match 52,71; Score 611; DB 4; Length 739; 

Best Local Similarity 52,2%; Pred. No. l,44e-98; 

Matches 82; Conservative 33; Mismatches 38; Indels 4; Gaps 3; 



Db 269 PLDKGFSCECVPGYSGKLCETDNDDCVAHKCRHGAQCVDTINGYTCTCPQGFSGPFCEHP 328 

II III: Ihl I : III III:: :|:|:| I :|:|| :|| 

Qy 2 PLPVHHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIP 61 

Db 329 PPMVLLQTSPCDQYECQNGAQCIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELA 388 

I: -I h llllll |: I hi llhll lllhMIII :|:|:::: 
Qy 62 PA-P--RSS-CEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFT 117 

Db 389 SAKVRPQANISLQVATDKDNGILLYKGDNDPLALELY 425 

:hllhllhl llllllhllll :hlll 
Qy 118 DLQNWPRANITLQVSTAEDNGILLYNGDNDHIAVELY 154 



RESULT 4 

ID Q24526 PRELIMINARY; PRT; 530 AA. 

AC Q24526; 

DT 01-NOV-1996 (TREMBLREL, 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE SLIT LOCUS ENCODING A PROTEIN ASSOCIATED WITH NEURAL DEVELOPMENT WITH 

DE 52D EGF HOMOLOGOUS DOMAINS (FRAGMENT) . 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-CANTON S; 

RX MEDLINE; 89077533. 

RA ROTHBERG J.M., HARTLEY D.A., WALTHER Z., ARTAVANI S "TSAKONAS S.; 

RT "slit: an EGF-homologous locus of D. melanogaster involved in the 

RT development of the embryonic central nervous system."; 

RL CELL 55:1047-1059(1988). 

DR EMBL; M23543; G514357; -. 

DR FLYBASE; FBgn0003425; sli, 

DR PROSITE; PS01186; EGFJ; 5. 

DR PROSITE; PS01187; EGF_CA; 2. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PF00054; lamininj; 1. 

KW' NEUROGENESIS; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 530 530 

SQ SEQUENCE 530 AA; 59457 MW; 10E5764D CRC32; 

Query Match 38.6%; Score 448; DB 5; Length 530; 

Best Local Similarity 41.4%; Pred. No. 1.42e-65; 

Matches 65; Conservative 39; Mismatches 44; Indels 9; Gaps 5; 

Db 167 HYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMIS 226 

I hi h I I:: III :|:|!|l: III :| | | | : |:| || |: 
Qy 6 HHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCE — IP 61 

Db 227 MMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELE 286 

|::| h II :| III :|:| lh I II I |::|| :::::: 

Qy 62 PA- PRSS - CEGTECQNGA- - NCVDQGSRPVCQCLPGFGGPECEKLLSVNFVDRDT YLQFT 117 

Db 287 PLRTRPEANVTI -VFSSGQNGILMYDGQDAHLAVELF 322 

I: :| 1 1 : 1 : I " :lllhhh: hlllh 
Qy 118 DLQNWPRANITLQVSTAEDNGILLYNGDNDHIAVELY 154 



RESULT 5 

ID Q20204 PRELIMINARY; PRT; 601 AA. 

AC Q20204; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE F40E10.4 PROTEIN (FRAGMENT). 

GN F40E10.4. ' 

OS CAENORHABDITIS ELEGANS, 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 
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RA SMYER.; 

RL SUBMITTED (FEB-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R. , ANDERSON K., BAYNES C, BERKS M. , 

RA BONFIELD J., BURTON J., CONNELL M., COPSEY T, , COOPER J., COULSON A., 

RA CRAXTON M. ( DEAR S., DO Z., DURBIN R., FAVELLO A., FULTON L., 

RA GARDNER A. , GREEN P., HAWKINS T. , HILLIER L., JIER M., JOHNSTON L. ( 

RA JONES M,, KERSHAW J., KIRSTEN J., LAISTER N., LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R,, 

RA SMALDON N., SMITH A., SONNHAMMER E., STADEN R., SULSTON J., 

RA THIERRY -MEG J., THOMAS K. , VAUDIN M., VAUGHAN K., WATERSTON R. , 

RA WATSON A., WEINSTOCK L., WILKINSON- SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

§elegans, n ; 
NATURE 368:32-38(1994). 
EMBL; Z69792; E1346469; -. 

DR PROSITE; PS01187; EGF_CA; 1. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

FT NONJER 1 1 

SQ SEQUENCE 601 AA; 66669 MW; F8A72773 CRC32; 

Query Match 38.2%; Score 443; DB 5; Length 601; 

Best Local Similarity 42.5%; Pred. No. 1.42e-64; 

Matches 68; Conservative 27; Mismatches 56; Indels 9; Gaps J 

Db 195 PINGSYSCMCSPGFTGNNCETNIDDCKNVECQNGGSCVDGILSYDCLCRPGYAGQYCEIP 254 

I: I I !:M:|I I Mil: MM: III : II III Ihll Nil 
Qy 2 PLPVHHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIP 61 

Db 255 PMMDMEYQKTDACQQSACGQGECVASQNSSDFTCKCHEGFSGPSCDRQMSVGFKNPGAYL 314 

I : I: II : I ::|| II II ||:|| |:: :|| | : ;|| 
Qy 62 PAPRSSCEGTE-CQ-N--G-ANCVD-QGSRP-VCQCLPGFGGPECEKLLSVNFVDRDTYL 114 

Db 315 ALDPLAS-D-GTITMTLRTTSKIGILLYYGDDHFVSAELY 352 

: I = : II: : I: lllll l|: :: Ml 
Qy 115 QFTDLQNWPRANITLQVSTAEDNGILLYNGDNDHIAVELY 154 



RESULT 6 

ID Q91902 PRELIMINARY; PRT; 721 AA. 

AC Q91902; 

«01-NOV-1996 (TREMBLREL, 01, CREATED) 
01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE X-DELTA-1, 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 95319507. 

RA HENRIQUE D. , ADAM J,, MYAT A., CHITNIS A,, LEWIS J,, ISH-HOROWICZ D.; 

RT "Expression of a Delta homologue in prospective neurons in the 

RT chick . " ; 

RL NATURE 375:787-790(1995). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 95319503. 

RA CHITNIS A.B., HENRIQUE D., LEWIS J., ISH-HOROWICZ D., KINTNER C.R,; 

RT "Primary neurogenesis in Xenopus embryos regulated by a homologue of 

RT the Drosophila neurogenic gene Delta."; 

RL NATURE 375:761-766(1995). 

DR EMBL; L42229; G807696; -. 

DR PROSITE; PS01186; EGF_2; 8. 

DR PROSITE; PS01187; EGF_CA; 2. 

DR PFAM; PF00008; EGF; 6. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 721 AA; 79922 MW; 028040EF CRC32; 



Query Match 



8.7%; Score 333; DB 13; Length 721; 



Best Local Similarity 42,3%; Pred. No. 6.30e-43; 

Matches 44; Conservative 18; Mismatches 39; Indels 3; Gaps 2; 

Db 423 EDLGNSYICQCQEGFSGRNCDDNLDDCTSFPCQNGGTCQDGINDYSCTCPPGYIGKNCSM 482 

: I hi h:| II M III lllh I I :| hi I III I : 
Qy 1 DPLPVHHRCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEI 60 

Db 483 P-ITK-CEHNPCHNGATCHERNNRYVCQCARGYGGNNCQFLLP 523 

I : M hill I :: :| Mil Ml :|: ||: 
Qy 61 PPAPRSSCEGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 7 

ID .Q25058 PRELIMINARY; PRT; 529 AA. 

AC Q25058; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN IA (FRAGMENT). 

OS HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN). 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

OC ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS, 



SEQUENCE FROM N.A. 
BISGROVE B.W.; 

SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS. 

EMBL; L33B61; G499686; -. 

PROSITE; PS00577; AVIDIN; 1. 

PROSITE; PS01186; EGF_2 ; 10. 

PROSITE; PS01187; EGF_CA; 7. 

PFAM; PF00008; EGF; 10. 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

NONJER 1 1 

529 AA; 55543 MW; 6385F322 CRC32; 



2; 



Query Match 28,1%; Score 326; DB 5; Length 529; 

Best Local Similarity 44,6%; Pred. No. 1.42e-41; 
Matches 41; Conservative 18; Mismatches 30; Indels 3; 

Db 273 CSCVQGFTGSDCETNINECASGPCQNGGTCVDGVNGFVCQCPPNYTGTYCEIS-L--DAC 329 

i I: Ml :| I ::| MM: Ml ||:: I I |:| I : : 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 330 SSMPCQNGATCVNVGANYICECPPGFAGQNCE 361 

: lllll lh I: MM IMM Ml 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 8 

ID Q90656 PRELIMINARY; PRT; 728 AA. 

AC Q90656; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE TRANSMEMBRANE PROTEIN C -DELTA - 1 . 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-SPINAL CORD; 

RX MEDLINE; 95319507. 

RA HENRIQUE D., ADAM J., MYAT A., CHITNIS A,, LEWIS J., ISH-HOROWICZ D.; 

RT "Expression of a Delta homologue in prospective neurons in the 

RT chick,"; 

RL 'NATURE 375:787-790(1995), 

DR EMBL; U26590; G882412; -. 

DR PROSITE; PS01186; EGF J; 8. 

DR PROSITE; PS01187; EGF_CA; 2. 

DR PFAM; PF00008; EGF; 6. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

SQ SEQUENCE 728 AA; 79861 MW; 7439F575 CRC32; 
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Query Match 28,0%; Score 325; DB 13; Length 728; 

Best Local Similarity 44.8%; Pred. No. 2,21e-41; 

Matches 43; Conservative 18; Mismatches 32; Indels 3; Gaps 2; 

Db 436 CQCQAGFTGRHCDDNVDDCASFPCVNGGTCQDGVNDYSCTCPPGYNGKNCSTP-VSR--C 492 

hi hi! :| :| III I: I || |:| I j|:| I I :| 

Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 493 EHNPCHNGATCHERSNRYVCECARGYGGLNCQFLLP 528 

I hill I ::::! I|:| hi :|: ||: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 9 

ID Q25059 PRELIMINARY; PRT; 406 AA. 

AC Q25059; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN III (FRAGMENT). 

«HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN), 
EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 
ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS . 

, HI 

RP SEQUENCE FROM N.A. 

RA BISGROVE B.W.; 

RL SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; L33862; G499688; -. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01186; EGF_2; 6. 

DR PROSITE; PS01187; EGF CA; 5. 

DR PFAM; PF00008; EGF; 7. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TEK 1 1 

SQ SEQUENCE 406 AA; 43475 MW; 45E6EE2C CRC32; 

Query Match 27.6%; Score 320; DB 5; Length 406; 

Best Local Similarity 43,5%; Pred, No. 2.03e-40; 

Matches 40; Conservative 19; Mismatches 30; Indels 3; Gaps 2; 

Db 74 CNCIPGFDGDNCENNINECASNPCQNGGVCIDGVNGFVCTCQPGYTGTLCETD-I--DEC 130 

hi: I: Nil :l ::| : lllh hi II:: I I 1 1 : 1 III | 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 131 ASNPCQNGGVCTDLVNMYTCDCLAGFTGSNCE 162 

: lllh I I : hlhll |::|| 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



•ULT 10 
P87357 PRELIMINARY; PRT; 717 AA. 
P87357; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE DELTAD TRANSMEMBRANE PROTEIN PRECURSOR, 

GN DELTAD. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO, 

RN [1] 

RP SEQUENCE FROM N,A, 

RX MEDLINE; 97346722. 

RA DORNSEIFER P., TAKKE C, CAMPOS-ORTEGA J. A.; 

RT "Overexpression of a zebrafish homologue of the Drosophila neurogenic 

RT gene Delta perturbs differentiation of primary neurons and somite 

RT development,"; 

RL MECH. DEV. 63:159-171(1997). 

DR EMBL; Y11760; E307461; -. 

DR PROSITE; PS01186; EGFJ; 8. 

DR PROSITE; PS01187; EGF_CA; 2. 

DR PFAM; PF00008; EGF; 6, 



V 



KW SIGNAL; TRANSMEMBRANE; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT SIGNAL 1 19 POTENTIAL. 

FT CHAIN 20 717 DELTAD TRANSMEMBRANE PROTEIN. 

SQ SEQUENCE 717 AA; 79061 MW; 5CC32ECA CRC32; 

Query Match 27.0%; Score 313; DB 13; Length 717; 

Best Local Similarity 40.8%; Pred, No, 4. 51e- 39; 

Matches 40; Conservative 21; Mismatches 34; Indels 3; Gaps 2; 

Db 427 CQCPEGFTGTHCEDNIDECATYPCQNGGTCQDGLSDYTCTCPPGYTGKNC-TSAVNK--C 483 

hi 1:11 :| :| hi lllh I I :: hi I Ihl I :: ; I 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 484 LHNPCHNGATCHEMDNRYVCACIPGYGGRNCQFLLPEN 521 

hill I : :| II hlhll :h II: I 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLSVN 106 



RESULT 11 

ID Q19350 PRELIMINARY; PRT; 1722 AA. 

AC Q19350; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE SIMILAR TO EGF-LIKE REPEATS. NCBI GI: 1125776, 

GN F11C7.4, 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M, , 

RA BONFIELD J., BURTON J,, CONNELL M,, COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M., DEAR S., DU Z., DURBIN R., FAVELLO A., FULTON L,, 

RA GARDNER Ai, GREEN P., HAWKINS T., HILLIER L., JIER M., JOHNSTON L, , 

RA JONES M., KERSHAW J. ( KIRSTEN J., LAISTER N., LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M. , 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R. , 

RA SMALDON N., SMITH A., SONNHAMMER E., STADEN R., SULSTON J,, 

RA THIERRY -MIEG J., THOMAS K., VAUDIN M., VAUGHAN K. , WATERSTON R., 

RA WATSON A., WEINSTOCK L, WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans . " ; 

RL NATURE 368:32-38(1994). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RA TAICH A., VETTER J.; 

RL SUBMITTED (JAN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U42839; G1125776; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 5. 

DR PROSITE; PS01186; EGFJ; 19, 

DR PROSITE; PS01187; EGF CA; 3. 

DR PFAM; PF00008; EGF; 24. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1722 AA; 188383 MW; CCFB86B8 CRC32; 

Query Match 27.0%; Score 313; DB 5; Length 1722; 

Best Local Similarity 41.9%; Pred. No. 4.51e-39; 

Matches 39; Conservative 18; Mismatches 34; indels 2; Gaps 2; 

Db 148 KCACPPGFVGDHCETDEDECKENFCQNGADCENLKGSYECKCLKGFSGKYCEIQ-D-KKQ 205 

: I h Ihl ::hlh: llllhl : lllh hi III : 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db* 206 CTSDYCHNNGQC I STG SDLSCKCSPGFDGAFCE 238 

I : hi : h II II III h II 
Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 12 
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ID 057462 PRELIMINARY; PRT; 802 AA, 

AC 057462; 

DT 01-JUN-1998 (TREMBLREL, 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE DELTAA. 

GN DELTAA. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII ; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 98165392. 

RA APPEL B., EISEN J.S.; 

RT "Regulation of neuronal specification in the zebrafish spinal cord by 

£L Delta function."; 

■ DEVELOPMENT 125:371-380(1998). 

W EMBL; AF030031; G2809389; 

DR PROSITE; PS01186; EGF 2; 8. 

KW GLYCOPROTEIN. 

SQ SEQUENCE 802 AA; 88941 MW; 42F041BD CRC32; 

Query Match 26.61; Score 308; DB 13; Length 802; 

Best Local Similarity 40.6%; Pred. No. 4 . 10e-38; 

Matches 39; Conservative 22; Mismatches 32; Indels 3; Gaps 2; 

Db 469 CQCPDGFTGMNCDRAGDECSMYPCQNGGTCQEGASGYMCTCPPGYTGRNCS-SPVSR-C 525 

hi hll II hi llll: I : ::| I I l|:|: I :| :|| 
Qy 9 CECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSSC 68 

Db 526 QHNPCHNGATCHERNNRYVCACVSGYGGRNCQFLLP 561 

: hill I :| II |::|:|| :|: ||: 
Qy 69 EGTECQNGANCVDQGSRPVCQCLPGFGGPECEKLLS 104 



RESULT 13 




ID 


Q92566 PRELIMINARY; PRT; 2408 AA. 




AC 


Q92566; 




DT 


01-FEB-1997 (TREMBLREL. 02, CREATED) 




DT 


01-FEB-1997 (TREMBLREL. 02, LAST SEQUENCE UPDATE) 




DT 


01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 




DE 


MYELOBLAST KIAA0279 (FRAGMENT). 




GN 


KIAA0279. 






HOMO SAPIENS (HUMAN). 




1 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 




CATARRHINI; HOMINIDAE; HOMO. 




RN 


[1] 




RP 


SEQUENCE FROM N.A. 




RC 


TISSUE=BRAIN; 




RX 


MEDLINE; 97191544. 




RA 


NAGASE T., SEKI N., ISHIKAWA K., OHIRA M,, KAWARABAYASI Y., O 


SARA O., 


RA 


TANAKA A., KOTANI H,, MIYAJIMA N., NOMURA N. ; 




RT 


"Prediction of the coding sequences of unidentified human gen 


3S. VI. 


RT 


The coding sequences of 80 new genes (KIAA0201-KIAA0280) deduced by 


RT 


analysis of cDNA clones from cell line KG-1 and brain."; 




RL 


DNA RES. 3:321-329(1996). 




DR 


EMBL; D87469; D1014097; -. 




DR 


PROSITE; PS01186; EGF J; 4. 




DR 


PFAM; PF00008; EGF; 6, 




DR 


PFAM; PF00028; cadherin; 5. 




DR 


PFAM; PF00054; laminin„G; 1, 




KW 


GLYCOPROTEIN. 




FT 


NONJER 1 1 




SQ 


SEQUENCE 2408 AA; 261740 MW; CDBA2001 CRC32; 





Query Match 25.9*; Score 300; DB 4; Length 2408; 

Best Local Similarity 33.1%; Pred. No. 1.39e-36; 

Matches 46; Conservative 33; Mismatches 55; Indels 5; Gaps 5; 

Db 758 RCRCPPGFTGDYCETEVDLCYSRPCGPHGRCRSREGGYTCLCRDGYTGEHCEVS-ARSGR 816 

II I MM I : I I : I ::| :hlll Hhh II:: I : 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 



Db 817 CTPGVCKNGGTCVNLLVGGFKCDCPSGDFEKPYCQ-VTTRSFPAH-SFITFRGLRQRFHF 874 

I I lh II: |:| :| I I |: : : :| : ::: | |: : : 

Qy 68 CEGTECQNG ANCVDQGSRPV - CQCLPG • FGGPECEKLLSVNFVDRDT YLQFT DLQNWPRA 125 

Db 875 TLALSFATKERDGLLLYNG 893 

:| I :hllll| 
Qy 126 NITLQVSTAEDNGILLYNG 144 



RESULT 14 

ID Q06008 PRELIMINARY; PRT; 1203 AA. 

AC Q06008; 

DT 01-NOV-1996 (TREMBLREL, 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH PROTEIN HOMOLOG 2 (MOTCH B PROTEIN) (FRAGMENT). 

GN NOTCH 2 OR MOTCH B. 

OS , MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN=F1 (CBA X C57BL) ; TISSUE-WHOLE EMBRYO; 

RX MEDLINE; 93178563. 

RA LARDELLI M., LENDAHL U,; 

RT "Motch A and motch B--two mouse Notch homologues coexpressed in a 

RT wide variety of tissues."; 

RL EXP. CELL RES, 204:364-372(1993). 

DR EMBL; X68279; G287990; -. 

DR MGD; MGI: 97364; NOTCH2. 

DR PFAM; PF00008; EGF; 27. 

DR PFAM; PF00066; notch; 1. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT. 

FT NONJER 1 1 

FT NONJER 1203 1203 

SQ SEQUENCE 1203 AA; 128982 MW; A5A95551 CRC32; 

Query Match 25.81; Score 299; DB 11; Length 1203; 

Best Local Similarity 38.71; Pred. No. 2,15e-36; 

Matches 36; Conservative 20; Mismatches 34; Indels 3; Gaps 2; 

Db 124 HCECLKGYAGPRCEMDINECHSDPCQNDATCLDKIGGFTCLCMPGFKGVHCELE-V--NE 180 

:Hh Ihl I : ::| III I |:| : :::|||: |: | ||: : 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 

Db 181 CQSNPCVNNGQCVDKVNRFQCLCPPGFTGPVCQ 213 

h: II: III :| I I III II |: 
Qy 68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



RESULT 15 

ID 035516 PRELIMINARY; PRT; 2470 AA. 

AC 035516; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL, 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE CELL SURFACE PROTEIN. 

GN NOTCH2. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN=C57B/6; TISSUE-THYMUS; 

RX MEDLINE; 93178563. 

RA LARDELLI M, , LENDAHL U.; 

RT "Motch A and motch B-two mouse Notch homologues coexpressed in a 

RT wide variety of tissues."; 

RL EXP. CELL RES. 204:364-372(1993). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57B/6; TISSUE=THYMUS; 
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RA HAMADA Y,, HIGUCHI M. , TSUJIMOTO Y.; 

RL SUBMITTED (JUL-1994) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; D32210; D1022953; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS01186; EGF_2 ; 27. 

DR PROSITE; PS01187; EGF_CA; 22. 

DR PFAM; PF00008; EGF; 34. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SO SEQUENCE 2470 AA; 265325 MW; CA94E03A CRC32; 

Query Match 25.8*; Score 299; DB 11; Length 2470; 

Best Local Similarity 38.7%; Pred. No. 2.15e-36; 

Matches 36; Conservative 20; Mismatches 34; Indels 3; Gaps 2; 

Db 439 HCECLKG Y AGPRCEMDI NECHSDPCQNDATCLDKIGGFTCLCMPGFKGVHCELE - V - - NE 495 

:IH: 11:1 I ::| III I hi : :::|||: I: I l|: : 
Qy 8 RCECMLGYTGDNCSENQDDCKDHKCQNGAQCVDEVNSYACLCVEGYSGQLCEIPPAPRSS 67 




496 CQSNPCVNNGQCVDKVNRFQCLCPPGFTGPVCQ 528 

I:: II: III :| I I III II |: 
68 CEGTECQNGANCVDQGSRPVCQCLPGFGGPECE 100 



Search completed: Fri May 28 09:21:48 1999 
Job time ; 50 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit, 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 



MasPar time 6.79 Seconds 
344.712 Million cell updates/sec 



^rch_pp protein • protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:23:41 1999; 

Tabular output not generated. 

Title: >0S-O9-19H47-11 

Description: (1-110} from US09191647 .pep 

Perfect Score: 873 

Sequence: 1 AFKCHHGQCHI SDRGEPYCL GSSFVEEVERHLECGCRACS 110 

Scoring table: PAM 150 
Gap 11 

Searched: 170751 seqs, 21266608 residues 

Post-processing; Minimum Match 01 

Listing first 45 summaries 

Database: a-geneseq35 

1: parti 2:part2 3:part3 4:part4 5:part5 6: parte 7:part7 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13:partl3 
14:partl4 15:partl5 16:partl6 17:partl7 18:partl8 
19:partl9 20:part20 21:part21 22:part22 23 :part23 
24 :part24 25:part25 26:part26 27:part27 28:part28 
29:part29 30:part30 31:part31 32:part32 33:part33 
34:part34 35:part35 36:part36 37:part37 38:part38 

•39:part39 
tistics : Mean 28.127; Variance 120.970; scale 0.233 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



% 

Query 



NO. 


Score 


Match Length DB 


ID 


Description 


Pred. No. 


1 


442 


50.6 


1534 


30 


W46966 


Amino acid sequence o 


1.90e 


31 


2 


167 


19.1 


1480 


5 


R25079 


Drosophila SLIT prote 


5.59e 


06 


3 


131 


15.0 


1872 


36 


W68510 


Partial human Notch- 3 


6.56e 


03 


4 


131 


15.0 


2321 


36 


W49698 


Human Notch3 protein. 


6.56e 


03 


5 


129 


14.8 


374 


22 


W07663 


Human transforming gr 


9.62e 


03 


6 


129 


14.8 


374 


27 


W37497 


Human TMP-2. 


9.62e 


03 


7 


126 


14.4 


379 


5 


R25565 


Beta -IG -Ml. 


1.71e 


02 


8 


121 


13.9 


1055 


29 


W44298 


Human serrate 2 prote 


4.40e 


02 


9 


121 


13.9 


1212 


29 


W44299 


Human serrate 2. 


4.40e 


02 


10 


121 


13.9 


1257 


19 


W05834 


Human Serrate- 2 (HJ2) 


4 . 4 0e 


02 


11 


120 


13.7 


612 


28 


W39256 


Human partial mature 


5.31e 


02 


12 


120 


13.7 


737 


28 


W39257 


Human membrane protei 


5.31e 


02 


13 


118 


13.5 


121 


12 


R64230 


HCT-15 contg. hTG-700 


7.73e 


02 


14 


114 


13.1 


381 


26 


W35957 


Human monocyte mature 


1.63e 


01 


15 


114 


13.1 


381 


26 


W35730 


Human cysteine rich p 


1.63e 


01 


16 


114 


13,1 


727 


21 


W11719 


C-Delta-1 polypeptide 


1.63e 


01 



17 


114 


13.1 


740 21 


W00876 


P-Hol t& -1 nnl vnonf i 
\* UC1LCL i puiypcpLiuu 


L.63e 


18 


113 


12.9 


177 8 


R40167 


Recombinant growth fa 


L,96e 


19 


112 


12.8 


44 3 


R15351 


Tumour C6ll growth in 


2.36e 


20 


112 


12.8 


46 3 


R15350 


Tumour csll growth ifl 


2.36e 


21 


112 


12.8 


157 21 


W11730 


H-HaI fa -1 nnl tmonf i Ho 
n L/clLd l yuiyfJtJpilUc 


2.36e 


22 


112 


12.8 


520 25 


W18348 


Pml i forflt* inn and Hif 


2.36e 


23 


112 


12.8 


660 21 


W11725 


H'Dsltci'l polypeptide 


2.36e 


24 


112 


12.8 


702 25 


W18349 


PtyiI i f Prat* i nn anH Hif 
riuiiiciauiuiJ auu uii 


2.36e 


25 


112 


12,8 


723 25 


W18353 


Proliferation and dif 


2.36e 


26 


111 


12.7 


183 30 


W46968 


Amino acid sequence o 


2,85e 


27 


111 


12.7 


228 30 


W46967 


Amino acid sequence o 


2,85e 


28 


109 


12.5 


44 7 


R38212 


Tumour cell prolif era 


4, lie 


29 


109 


12.5 


46 7 


R41537 


Tumour cell prolifera 


4. lie 


30 


109 


12.5 


46 9 


R45226 


P*l cancer cell growt 


4. lie 


31 


109 


12.5 


51 3 


P50547 


Protein analogue enco 


4, lie 


32 


109 


12.5 


179 37 


W75100 


Human secreted protei 


4. lie 


33 


108 


12.4 


383 10 


R56166 


Neuroendocrine tumor 


4.94e 


34 


107 


12.3 


385 10 


R56167 


Neuroendocrine tumor 


5.94e 


35 


106 


12,1 


50 2 


P98501 


num uuye ui iiulivc jiu 


7 ,13e 


36 


106 


12.1 


77 16 


R80786 


Human heparin binding 


7!l3e 


37 


106 


12.1 


82 16 


R92917 


Met-Cys-Ala-Met-Ala-H 


7.13e 


38 


106 


12.1 


160 1 


P91000 


Transforming growth f 


7.13e 


39 


106 


12.1 


208 16 


R80785 


Human precursor hepar 


7.13e 


40 


106 


12.1 


208 16 


R80787 


Monkey precursor hepa 


7.13e- 


41 


106 


12.1 


208 16 


R92897 


Human HBEGF precursor 


7.13e- 


42 


106 


12.1 


208 4 


R23998 


EGF/HB-EHM, 


7.13e- 


43 


106 


12.1 


208 13 


R66190 


Diphtheria toxin (DT) 


7.13e- 


44 


106 


12,1 


333 16 


R92915 


HBEGF-linker-SAP, 


7.13e- 


45 


106 


12,1 


333 16 


R92900 


HBEGF Val Met saporin 


7.13e- 



RESULT 
ID 
AC 
DT 



1 



W46966 standard; Protein; 1534 AA. 
W46966; 

06-JUL-1998 (first entry) 

DE Amino acid sequence of a human slit-like polypeptide, 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody. 

OS Homo sapiens, 

FH Key Location/Qualifiers 

FT Peptide 1. .26 

FT /note= "signal peptide" 

FT Protein 27.. 1534 

FT /note- "mature protein" 

PN J10087699-A. 

PD 07-APR-1998. 

PF 15-JUL-1997; 205351, 

PR 16-JUL-1996; JP-186219. 

PA (AS AH ) ASAHI KASEI KOGYO KK, 

DR DPI; 98-267127/24. 

DR N-PSDB; V16978, 

PT Human Slit- like protein.- useful for diagnosis and treatment of 

PT brain -specific diseases and cancers 

PS Disclosure; Pages 31-35; 45pp; Japanese, 

CC The present sequence represents a novel human slit-like protein (the 

CC mature protein is claimed in Claim 1) , The slit- like polypeptide is 

CC useful for diagnosis and treatment of brain-specific diseases and 

CC cancers. Antibodies directed against the protein, or its fragments 

CC can also be used for diagnosing cancer. 

SO Sequence 1534 AA; 

Query Match 50.6%; Score 442; DB 30; Length 1534; 

Best local Similarity 47.74; Pred. No. l,90e-31; 

Matches 53; Conservative 24; Mismatches 33; Indels 1; Gaps 1; 

Db 1424 glqclhghcqasgtkgahcvcdpgfsgelceqesecrgdpvrdfhqvqrgyaicqttrpl 1483 
" I IN: I : hMIIII Mil: I h II: : Ml I h : • 
Qy 1 AFKCHHGQCHISDRGEPYCLCQPGFSGHHCEQENPCMGEIVREAIRRQKDYASCATASKV 60 



Db 



1484 swecrgscpgqgccqglrlkrrkftfecsdgtsfaeevekptkcgcalca 1534 
: :||lhl I III : ll||: |:|:||:|| III! ||| |: 
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0y 61 PIMECRGGC-GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCRACS 110 



RESULT 2 

ID R25079 standard; Protein; 1480 AA. 

AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss, 

OS Drosophila melanogaster. 

FH Key Location/Qualifiers 

FT peptide 1..36 

FT /label- signal 

FT domain 73.. 294 

FT /label- Flank_LRR_Flank_l 

FT /note- "mediates adhesive events" 

FT domain 295.. 518 

FT /label- Flank -LRR-FlankJ 

FT /note- "mediates adhesive events" 

• domain 519.. 7 14 
/label- Flank_LRR_Flank_3 
/note- "mediates adhesive events" 

FT domain 715.. 910 

FT /label" Flank_LRR_FlankJ 

FT /note» "mediates adhesive events" 

FT region 911.. 1150 

FT /label" Tandem_EGF_like_repeats 

FT /note- "involved in protein-protein interactions" 

FT region 1353.. 1393 

FT /label- 7thJGF_like_repeat 

FT /note- "involved in receptor -ligand interactions" 

FT region 1394.. 1404 

FT /label- alternative_splice_segment 

FT /note- "developmentally regulated" 

FT region 1405.. 1480 

FT /label- C-terminal region 

PN WO9210518-A. 

PD 25-JUN-1992. . 

PF 27-NOV-1991; 009055. 

PR 07-DEC-1990; 0S-624135. 

PA (OYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Rothberg JM; 

DR WPI; 92-234590/28. 

DR N-PSDB; Q25811. 

PT SLIT protein and sequence elements for treating 

PT neuro-degenerative disease • useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

•Claim 1; Page 84-89; 122pp; English. 
The SLIT protein is necessary for normal development of the midline 
of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways. The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse than. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding. SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes -caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage, The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 

CC claimed as are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 



Query Match 19.1%; Score 167; DB 5; Length 1480; 

Best Local Similarity 29.4%; Pred. No. 5.59e-Q6; 

Matches 35; conservative 25; Mismatches 47; Indels 12; Gaps 12; 

Db 1361 kcrrgsrcvpnsnardgyqckckhgqrgrycdqgegstepptvtaastcrkeqvreyyte 1420 

H::| :| I: : I I I I I: hi I : I I |: I ::| : 
Qy 3 KCHHG -QC - HISDRGEPY -CLCQPGFSGHHCEQ-ENPCMGEI VREA- I - RR -Q - KDY - AS 53 

Db 1421 ndcrsrqplkyakcvggcgnqccaakivrrrkvrmvcsnnrkyiknldivrkcgctkkc 1479 

h |:: I INI II : :||| h; :: ::: III : | 
Qy 54 CATASKVPIM-E-CRGGCGTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGC-RAC 109 



RESULT 3 

ID W68510 standard; Protein; 1872 AA. 

AC W68510; 

DT 06-JAN-1999 (first entry) 

DE Partial human Notch-3 protein, 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy. 



OS 


Homo sapiens, 




FH 


Key Location/Qualifiers 


FT 


Miscjifference 328 




FT 


/note- 


"encoded by NAN" 


FT 


Miscjifference 401 


FT 


/note- 


"encoded by GNN" 


FT 


Miscjifference 403 




FT 


/note- 


"encoded by GNC" 


FT 


Miscjifference 406 


FT 


/note- 


"encoded by GNN" 


FT 


Miscjifference 409 


FT 


/note- 


"encoded by NNT" 


FT 


Miscjifference 420 




FT 


/note- 


"encoded by GNC" 


FT 


Miscjifference 706 


FT 


/note- 


"encoded by NNN" 


FT 


Miscjifference 708 


FT 


/note- 


"encoded by CCN" 


FT 


Miscjifference 719 




FT 


/note- 


"encoded by CGN" 


FT 


Miscjifference 728 


FT 


/note- 


"encoded by CNT" 


FT 


Miscjifference 729 


FT 


/note- 


"encoded by GTN" 
9 


FT 


Miscjifference 759., 7£ 


FT 


/note- 


"encoded by NNN" 


FT 


Miscjifference 1425 




FT 


/note- 


"encoded by GNA" 


PN 


FR2751985-A1, 


PD 


06-FEB-1998. 




PF 
PR 


01-AUG-1996; 009733. 
01-AUG-1996; FR-QQ9733 





PA (INRM ) INSERM INST NAT SANTE & RECH MEDICALE, 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133137/13. 

DR N-PSDB; V57163, ■ 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig la-lg; 4 2pp ; French, 

CC This sequence represents a partial human notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes . Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of the 

CC cerebral autosomal dominant arteriopathy with subcortical infarcts and 

CC leukoencephalopathy (CADASIL) type. Blocking expression of a mutated 

CC Notch3 gene or by substitution therapy with non -mutated Notch3 gene or 

CC protein can be used to treat CADASIL or related disorders . 

SQ Sequence 1872 AA; 
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Query Match 15.0%; Score 131; DB 36; Length 1872; 

Best Local Similarity 42.1*; Pred. No. 6.56e-03; 

Matches 16; Conservative 10; Mismatches 9; Indels 3; Gaps 3; 

Db 989 cqaggqc-vdedsshycvcpegrtgshceqevdpclaq 1025 

I: I II : : : Ihl I :| Mill :||::: 
Oy 4 CHHG-QCHISDRGEPYCLCQPGFSGHHCEQE-NPCMGE 39 



RESULT 4 

ID W49698 standard; Protein; 2321 AA. 

AC W49698; 

DT 21-DEC-1998 (first entry) 

DE Human Notch3 protein. 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

•■ cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 
leukoencephalopathy ; therapy . 
Homo sapiens, 

PN FR2751986-A1. 

PD 06-FEB-1998, 

PF 16-APR-1997; 004680. 

PR 01-AUG-1996; FR-009733 . 

PA (INRM ) INSERM INST NAT SANTE S RECH MEDICALE. 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133138/13. 

DR N-PSDB; V57001. 

PT Human Notch3 nucleic acids • and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig 1.1-1,8; 45pp; French. 

CC This sequence represents the human Notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of 

CC the cerebral autosomal dominant arteriopathy with subcortical infarcts 

CC and leukoencephalopathy (CADASIL) type. Blocking expression of a 

CC mutated Notch3 gene or by substitution therapy with non -mutated Notch3 

CC gene or protein can be used to treat CADASIL or related disorders. 

SQ Sequence 2321 AA; 

Query Match 15.04; Score 131; DB 36; Length 2321; 

Best Local Similarity 42, It; Pred. No. 6.56e-03; 

Matches 16; Conservative 10; Mismatches 9; Indels 3; Gaps 3; 

•1055 cqaggqc-vdedsshycvcpegrtgshceqevdpclaq 1091 
I: I II : : : Ihl I :| ||||| :||:;: 
Qy 4 CHHG - QCH ISDRGEPYCLCQPGFSGHHCEQE - NPCMGE 39 



RESULT 5 

ID W07663 standard; Protein; 374 AA, 

AC W07663; 

DT 30-JUL-1997 (first entry) 

DE Human transforming growth factor alpha HII . 

KW TGFalpha-HII; AIDS; dementia; ocular disease; kidney disorder; 

KW liver disorder; hair follicle development; angiogenesis; ulcer; 

KW corneal incision; embryogenesis; gene therapy; neoplasia; 

KW psoriasis. 

OS Homo sapiens. 

PN WO9636709-A1. 

PD 21 -NOV- 1996. 

PF 19-MAY-1995; U06386, 

PR 19-MAY-1995; WO-U06386. 

PR 12-JUN-1995; ZA-004848. 

PA (HUMA-) HUMAN GENOME SCI INC. 

PI Meissner PS, Ni J, Wei Y; 

DR WPI; 97-012084/01. 

DR N-PSDB; T45428. 

PT Nucleic acid encoding human transforming growth factor-alpha HII - 

PT useful for treating, e.g. ocular diseases, kidney and liver 

PT disorders, or to stimulate wound healing etc 

PS Claim 19; Page 48-49; 73pp; English. 



CC The present sequence represents the 374 amino acid sequence for 

CC human transforming growth factor alpha HII (TGFalpha-HII) , Human 

CC TGFalpha-HII can be used to stimulate wound healing; to restore normal 

CC neurological functioning after trauma or AIDS, dementia; to treat ocular 

CC diseases, kidney and liver disorders; promote hair follicle development; 

CC to stimulate angiogenesis for treating burns, ulcers, corneal incisions; 

CC and to stimulate embryogenesis. The TGFalpha-HII can be used directly or 

CC is generated in situ, i.e. by gene therapy. Antagonists of TGFalpha-HII 

CC are useful for treating neoplasia and for treating certain skin 

CC disorders, such as psoriasis. Detecting mutations in the polynucleotide 

CC sequence is used for diagnosing diseases (or susceptibility to diseases) 

CC which result from underexpression of TGFalpha-HII. 

SQ Sequence 374 AA; 

Query Match 14,8%; Score 129; DB 22; Length 374; 

Best Local Similarity 31.0*; Pred. No. 9.62e-03; 



Matches 


27; Conservative 19; Mismatches 35; Indels 6; Gaps 


Db 


271 


gf-cmhgkcehsinmqepscrcdagytgqhcekkdysvlyvvpgpvrfqyvlia-avigt 328 
:| 1 II 1 1 = III l::h:|:||| : : :| ::| | : 1 : 
AFKCHHGQC-HISDRGEPYCLCQPGFSGHHCEQENPCMGEIVREAIRRQKDYASCATASK 59 


Qy 


1 


Db 


329 


iqiavicvvvlcitrkcprsnrihrqk 355 
: 1 : 1 1 1 | :: | |:| 


Qy 


60 


VPI-MECRGG-CGT-TCCQPIRSKRRK 83 



RESULT 6 

ID W37497 standard; Protein; 374 AA. 

AC W37497; 

DT 20-APR-1998 (first entry) 

DE Human TMP-2. 

KW Human; foetal brain cDNA library; GDP dissociation stimulating protein; 

KW brain specific nucleosome assembly protein; diagnosis; therapy; 

KW skeletal muscle specific ubiquitin conjugating enzyme; TMP-2; NPIK; 

KW nel-related protein type 1; nel-related type 2; hereditary disease; 

KW cancer. 

OS Homo sapiens. 

PN EP-796913-A2. 

PD 24-SEP-1997. 

PF 19-MAR-1997; 104842. 

PR 05-MAR-1997; JP-069163. 

PR 19-MAR-1996; JP-063410. 

PA (SAKA ) OTSUKA PHARM CO LTD. 

PI Fujiwara T, Horie M, Watanabe T; 

DR WPI; 97-459830/43. 

DR N-PSDB; V01874, V01875. 

PT Novel human genes, e.g. brain-specific nucleosome assembly protein - 

PT useful for diagnosis or therapy of hereditary disease and cancer 

PS Claim 10; Page 67-68; 123pp; English, 

CC The present sequence represents a TMP-2 isolated from a human foetal 

CC brain cDNA library. The nucleotide or amino acid sequences are useful 

CC for in-vitro diagnosis of hereditary diseases and cancer and for 

CC preparation of pharmaceuticals. 

SQ Sequence 374 AA; 

Query Match 14.8%; Score 129; DB 27; Length 374; 

Best Local Similarity 31.0%; Pred. No. 9,62e-03; 

Matches 27; Conservative 19; Mismatches 35; Indels 6; Gaps 6; 

Db 271 gf-cmhgkcehsinmqepscrcdagytgqhcekkdysvlyvvpgpvrfqyvlia-avigt 328 

:| I II I I : III |::|::|:||| : : :| ::| I : | : 
Qy 1 AFKCHHGQC-HISDRGEPYCLCQPGFSGHHCEQENPCMGEIVREAIRRQKDYASCATASK 59 

Db 329 iqiavicvvvlcitrkcprsnrihrqk 355 

: I : I I I I :: I 1:1 
Qy 60 VPI-MECRGG-CGT-TCCQPIRSKRRK 83 



RESULT • 7 

ID R25565 standard; Protein; 379 AA. 
AC R25565; 

DT 18-JAN-1993 (first entry) 



Tue Jun 1 10:15:56 1999 



US-09-191-647-ll.rag 



Page 4 



DE Beta -IG -Ml, 

KW Transforming growth factor beta; induced; CEF-10; vsrc; chicken; 

KW embryo; fibroblasts; TGF-beta, 

OS Mus musculus . 

PN EP-495674-A. 

PD 22-JUL-1992. 

PF 17-JAN-1992; 300429. 

PR 18-JAN-1991; US-642991. 

PR 10-JAN-1992; US-816270. 

PA (BRIM ) BRISTOL-MYERS SQUIBB CO. 

PI Brunner AM, Chinn J, Neubauer MG, Purchio AF; 

DR WPI; 92-243508/30. 

DR N-PSDB; Q26421. 

PT TGF-beta induced gene family - encodes proteins involved in 

PT growth and differentiation effects of TGF-beta-1 

PS Claim 2; Fig 1; 35pp; English. 

CC The protein sequence was deduced from the DNA sequence obtd. by 

CC screening a cDNA library made from AKR-2B mouse cells induced with 

CC TGF-betal and cyclohexamide with two probes from untreated AKR-2B 

CC mRNA and AKR-2B mRNA from cells treated with cyclohexamide and TGF- 

CC betal. The proteins encoded by hybridising colonies (beta-IG-Ml and 

fbeta-iG-M2) contain 38 Cys residues and are induced by TGF-betal. 
Beta-IG-Ml displays 80 percent homology to the CEF-10 protein 
induced by v-src in chicken embryo fibroblasts and is identical 

CC to the protein encoded by cyr61, an immediate early response gene 

CC induced in quiescent BALB 3T3 cells by serum treatment. Residues 

CC 49-56 of beta-IG-Ml conform to the GCGCCXXC motif reported in the 

CC amino half of insulin-like growth factor (IGF) binding proteins. 

CC The C-terminal Cys rich region of beta-IG-Ml, -M2 and CEF-10 contain 

CC an amino acid sequence with strong homology to a motif found near the 

CC C-terminal of the malarial circumsporozoite (CS) protein, which is 

CC highly conserved among all species of malarial parasites sequenced 

CC to date (designated region II). This motif is also found in 

CC other proteins which have cell adhesive properties that mediate 

CC cell-cell and cell-extracellular matrix interactions, such as 

CC properdin, thrombospondin, and TRAP, The proteins encoded by 

CC TGF-beta induced genes are likely to be involved in mediation of 

CC the biological effects of TGF-beta relating to cell growth and 

CC differentiation, See also R25566. 

SO Sequence 379 AA; 

Query Match 14,44; Score 126; DB 5; Length 379; 

Best Local Similarity 32,34; Pred. No. 1.71e-02; 

Matches 20; Conservative 12; Mismatches 27; Indels 3; Gaps 3; 

Db 298 yagcssvkkyrpkyc-gscvdgrcctplqtrtvkmrfrcedgemfsknvmmiqsckcnyn 356 

11:1:: I : I hi II |:::: I hi II I :| II 
Oy 51 YASCATASKVPIMECRGGC-GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCR-A 108 

•357 cp 358 
109 CS 110 



RESULT 8 

ID W44298 standard; Protein; 1055 AA, 

AC W44298; 

DT 19-JUN-1998 (first entry) 

DE Human serrate 2 protein fragment. 

KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour, 

OS Homo sapiens. 

PN WO9802458-A1, 

PD 22-JAN-1998. 

PF ll-JUL-1997; J02414. 

PR 14-MAY-1997; JP-124063. 

PR 16-JUL-1996; JP-186220. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15181. 

PT Human serrate-2 gene expression products • used to regulate stem 

PT cell differentiation, useful in treating neoplasms, e.g. leukaemia 



PS Claim 2; Page 57-62; 103pp; Japanese. 

CC The present sequence represents a human serrate 2 protein fragment. The 

CC present invention also describes a method for the preparation of the 

CC polypeptides, and antibodies binding to the polypeptide and its 

CC fragments. The polypeptide and its fragments expressed by the serrate-2 

CC gene can be used to inhibit stem (especially blood stem) cell 

CC differentiation and to inhibit endothelial cell growth. They may be 

CC incorporated in a cell culture media for culturing undifferentiated 

CC stem cells. They can also be used for treatment of neoplasms such as 

CC leukaemia. The antibodies can be used for the diagnosis of malignant 

CC tumours. 

SQ Sequence 1055 AA; 

Query Match 13.94; Score 121; DB 29; Length 1055; 

Best Local Similarity 34.14; Pred. No. 4.40e-02; 

Matches 15; Conservative 15; Mismatches 10; Indels 4; Gaps 4; 

Db 580 cgphgrc-vsqpggnfscicdsgftgtycheniddclgqpcrng 622 

I Ihl :|: I : |:|::||:| I :: : hh |:: 
Qy 4 C -HHGQCHISDRGEPY -CLCQPGFSGHHCEQE - NPCMGEIVREA 44 



RESULT 9 

ID W44299 standard; Protein; 1212 AA. 

AC W44299; 

DT 19-JUN-1998 (first entry) 

DE Human serrate 2, 

KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour. 

OS Homo sapiens. 

PN WO9802458-A1, 

PD 22-JAN-1998. 

PF ll-JUL-1997; J02414. 

PR 14-MAY-1997; JP-124063, 

PR 16-JUL-1996; JP-186220. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15181, 

PT Human serrate-2 gene expression products - used to regulate stem 

PT. cell differentiation, useful in treating neoplasms, e.g. leukaemia 

PS Claim 3; Page 62-68; 103pp; Japanese. 

CC The present sequence represents human serrate 2 . The present invention 

CC also describes a method for the preparation of the polypeptides, and 

CC antibodies binding to the polypeptide and its fragments. The polypeptide 

CC and its fragments expressed by the serrate-2 gene can be used to inhibit 

CC stem (especially blood stem) cell differentiation and to inhibit 

CC endothelial cell growth. They may be incorporated in a cell culture 

CC media for culturing undifferentiated stem cells , They can also be used 

CC for treatment of neoplasms such as leukaemia. The antibodies can be used 

CC for the diagnosis of malignant tumours. 

SQ Sequence 1212 AA; 

Query Match 13.94; Score 121; DB 29; Length 1212; 

Best Local Similarity 34.14; Pred. No. 4.40e-02; 

Matches 15; Conservative 15; Mismatches 10; Indels 4; Gaps 4; 

Db 580 cgphgrc-vsqpggnfscicdsgftgtycheniddclgqpcrng 622 

I Ihl :|: I : 1 : 1 : : 1 1 : 1 I : hh h: 
Qy 4 ■ C-HHGQCHISDRGEPY-CLCQPGFSGHHCEQE-NPCMGEIVREA 44 



RESULT 10 

ID W05834 standard; Protein; 1257 AA. 

AC W05834; 

DT 28-JAN-1997 (first entry) 

DE Human Serrate-2 (HJ2). 

KW Serrate-2; human Jagged-2; HJ2; Notch; cell differentiation; 

KW cell fate; central nervous system; cancer; tissue repair; therapy; 

KW diagnosis; antibody. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT domain 1. .912 
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FT /label- Extracellular.domain 

FT /note* "a deletion in the encoding cDNA clone 

FT results in loss of part of the Serrate- 2 

FT signal peptide and beginning of the DSL 

FT domain 

ft domain 26.. 70 

FT /label- DSL 

FT /note- "region of homology with Drosophila Delta 

FT and Serrate, predicted to mediate binding 

FT with Notch" . 

FT domain 75.. 73 5 

FT /label- ELR 

FT /note- "epidermal growth factor-like repeat domain" 

FT region 75., 105 

FT /label= ELR1 

FT region 106.. 140 

•/label- ELR2 
region 141.. 180 

/label- ELR3 

FT region 181.. 218 

FT /label- ELR4 

FT region 219., 256 

FT /label= ELR5 

FT region 257,. 294 

FT /label- ELR6 

FT region 295.. 331 

FT /label- ELR7 

FT region 332.. 369 

FT /label- ELR8 

FT region 370.. 407 ' 

FT /label- ELR9 

FT region 408.. 435 

FT /label- Partial_ELR 

FT region 436,, 469 

FT /label- PartialJLR 

FT region 470.. 507 

FT /label- ELR10 

FT region 508.. 545 

FT /label- ELRll 

FT region 546,, 584 

FT /label- ELR12 

FT region 585.. 622 

FT /label- ELR13 

FT region 623,. 660 

•/label- ELR14 
region 664.. 701 

/label- ELR15 

FT region 702.. 718 

FT /label- Partial ELR 

FT region 719.. 735 

FT /label- Partial ELR 

FT domain 913.. 933 

FT /label- Transmembrane domain 

FT domain 934,, 1257 

FT /label- Intracellularjomain 

PN WO9627610-A1, 

PD 12-SEP-1996, 

PF 07-MAR-1996; U03172, 

PR 07-MAR-1995; US-400159, 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY, 

PA (Ora ) UNIV YALE, 

PI Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 

PI Lewis JH, Mann RS, Myat AM; 

DR WPI; 96-425379/42. 

DR N-PSDB; W05834. 

PT Vertebrate Serrate protein and related DNA - used to treat or 

PT prevent malignancies characterised by increased Notch activity. 

PS Claim 5; Page 104-107; 161pp; English. 

CC Human Serrate-1 (W05833) and human Serrate-2 (W05833) are ligands 

CC for the zygotic neurogenic locus Notch, and are believed to play a 

CC major role in determining cell fates (differentiation) in the 

CC central nervous system. Their amino acid sequences were deduced 

CC from cDNA clones (see also T40090-91) isolated from human foetal 



CC brain cDNA libraries. The proteins, antibodies raised to them, 

CC and encoding nucleic acids can be used in the detection of 

CC' Serrate sequences and in the treatment of disorders of cell fate 

CC or differentiation, partic. cancer, nervous system disorders 

CC and in tissue repair or regeneration. 

SO Sequence 1257 AA; 

Query Match 13.9%; Score 121; DB 19; Length 1257; 

Best Local Similarity 34.1%; Pred. No. 4.40e-02; 

Matches 15; Conservative 15; Mismatches 10; Indels 4; Gaps 4; 

Db 436 cgphgrc-vsqpggnfscicdsgftgtycheniddclgqpcrng 478 

I Ihl :|: I : |:|::||:| I :: : |:|: |:: 
Qy 4 C-HHGQCHISDRGEPY-CLCQPGFSGHHCEQE-NPCMGEIVREA 44 



RESULT 11 

ID W39256 standard; protein; 612 AA. 

AC W39256; 

DT 19-MAY-1998 (first entry) 

DE Human partial mature membrane protein. 

KW Epidermal growth factor motif; EGF motif; membrane protein; disease; 

KW brain; nervous tissue; cancer. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Protein 1. .612 

FT /note- "partial mature protein" 

PN J10036395-A. 

PD 10-FEB-1998. 

PF 24-JUL-1996; 194467. 

PR 24-JUL-1996; JP-194467. 

PA (AS AH ) AS AH I KASEI KOGYO KK. 

DR WPI; 98-174912/16. 

PT New human membrane protein - specifically expressed in brain and 

PT nervous tissue; used in diagnosis of diseases specific to, these 

PT tissues and cancer 

PS Claim 1; Column 18-19; 26pp; Japanese. 

CC W39256 represents the partial mature amino acid sequence of a novel 

CC membrane protein which contains epidermal growth factor (EGF) motifs. 

CC The new membrane protein is expressed specifically in brain and nervous 

CC tissue. The protein and DNA can be used in the diagnosis of brain and 

CC nerve system specific diseases and cancer. 

SQ Sequence 612 AA; 

Query Match 13.7*; Score 120; DB 28; Length 612; 

Best Local Similarity 48.6%; Pred. No, 5.31e-02; 

Matches 18; Conservative 7; Mismatches 8; Indels 4; Gaps 4; 



451 cahgtcr-sv-gtsykclcdpgyhglyceeeynecls 485 
I II I: I t :| 111:11: I l|:| I |:: 
4 CHHGQCHISDRGEPY-CLCQPGFSGHHCEQE-NPCMG 38 



ID 


W39257 standard; protein; 737 AA. 


AC 


W39257; 




DT 


19-MAH998 


(first entry) 


DE 


Human membrane protein. 


KW 


Epidermal growth factor motif; EGF motif; membrane protein 


KW 


brain; nervoi 


s tissue; cancer; disease. 


OS 


Homo sapiens 




FH 


Key 


Location/Qualifiers 


FT 


Peptide 


1. .26 


FT 




/label- signal 


FT 


Protein 


27. .737 


FT 




/label- membrane_protein 


PN 


J10036395-A. 




PD 


10-FEB-1998. 




PF 


24-JDL-1996; 194467. 


PR 


24-JUL-1996; JP-194467. 


PA 


(AS AH ) ASAHI KASEI KOGYO KK. 


DR 


WPI; 98-174912/16. 


DR 


N-PSDB; V09641. 



Tue Jun 1 10:15:56 1999 



US-09-19M 



■647-11. rag 



Page 6 



PT New human membrane protein ■ specifically expressed in brain and 

PT nervous tissue; used in diagnosis of diseases specific to these 

PT tissues and cancer 

PS Claim 2; Pages 19-21; 26pp; Japanese. 

CC W39257 represents the amino acid sequence of a novel membrane protein 

CC which contains epidermal growth factor (EGF) motifs. The new membrane 

CC protein is expressed specifically in brain and nervous tissue. The 

CC protein and DNA can be used in the diagnosis of brain and nerve system 

CC specific diseases and cancer, 

SQ Sequence 737 AA; 

Query Match 13.7%; Score 120; DB 28; Length 737; 

Best Local Similarity 48.6*; Pred. No. 5.31e-02; 

Matches 18; Conservative 7; Mismatches 8; Indels 4; Gaps 4; 

Db 477 cahgtcr - s v - g tsykclcdpgyhg lyceeeynecls 511 

I I! I: I I :l llhll: I I: | :: 
Qy 4 CHHGQCHISDRGEPY -CLCQPGFSGHHCEQE ■ NPCMG 38 



RESULT 13 

♦R64230 standard; Protein; 121 AA. 
R64230; 
24-AUG-1995 (first entry) 

DE HCT-15 contg. hTG-700 tumour cell proliferation inhibiting factor. 

KW HCT-15; tumour cell proliferation inhibiting factor; hTG-700; 

KW leukaemia; renal cancer; cervical cancer. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT protein 15.. 60 

FT /label- hTG-700 

PN WO9429340-A. 

PD 22-DEC-1994. 

PF Q2-JUN-1994; J00895. 

PR 04-JON-1993; JP-134854. 

PR 04-JUN-1993; JP-134855. 

PA (TAIS ) TAISHO PHARM CO LTD, 

PI Hanada K, Komurasaki T, Nakazawa K, Takahashi M; 

PI Toyoda H, Uchida D, Udaka S; 

DR WPI; 95-036401/05, 

DR N-PSDB; 080358, 

PT Human tumour cell proliferation inhibiting factor - for the 

PT treatment of leukaemia, renal cancer, cervical cancer etc. 

PS Claim 1; Fig 4; 62pp; Japanese. 

CC Q80358 encodes R64230 HCT-15, which contains the human tumour 

CC cell proliferation inhibiting factor hTG-700 (R64229). The factor 

CC may be useful in remedies for leukaemia, renal cancer and cervical 

CC cancer. 

SQ Sequence 121 AA; 

•bery Match 13.5%; Score 118; DB 12; Length 121; 

est Local Similarity 43.3*; Pred. No. 7.73e-02; 
Matches 13; Conservative 8; Mismatches 8; Indels 1; Gaps 1; 

Db 28 clhgqciylvdmsqnycrcevgytgvrceh 57 

I INI : I :: II |: |::| :||: 
Qy 4 CHHGQC-HISDRGEPYCLCQPGFSGHHCEQ 32 



RESULT 14 

ID W35957 standard; Protein; 381 AA, 

AC W35957; 

DT 05-MAR-1998 (first entry) 

DE Human monocyte mature differentiation factor. 

KW Human; monocyte; mature; differentiation factor; MMDF; macrophage; 

KW cancer; immune activator; tissue culture; infectious disease, 

OS Homo sapiens. 

PN J09234079-A. 

PD 09-SEP-1997. 

PF 04-MAR-1996; 075236. 

PR 04-MAR-1996; JP-075236. 

PA (TOYM ) TOYOBO KK. 

DR WPI; 97-497320/46. 



DR N-PSDB; T97142. 

PT A monocyte mature differentiation factor • useful for the long term 

PT tissue culture of macrophage(s) 

PS Claim 9; Page 12-13; 22pp; Japanese. 

CC The present sequence represents a monocyte mature differentiation 

CC factor (MMDF) which maintains the life of macrophages for long periods 

CC. in liquid culture, MMDF can be used as an anti-cancer agent, an immune 

CC activator and to treat infectious diseases, 

SQ Sequence 381 AA; 

Query Match 13.1%; Score 114; DB 26; Length 381; 

Best Local Similarity 32.3%; Pred. No. 1.63e-Ql; 

Matches 20; Conservative 10; Mismatches 29; Indels 3; Gaps 3; 

Db 300 yagclsvkkyrpkyc-gscvdgrcctpqltrtvkmrfrcedgetfsknvmmiqsckcnyn 358 

I:! : I : I 1:1 II:; |:| II :| :| I I 
Qy 51 YASCATASKVPIMECRGGC-GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCR-A 108 

Db 359 cp 360 

I: 

Qy 109 CS 110 



RESULT 15 

ID W35730 standard; Protein; 381 AA, 

AC W35730; 

DT 27-MAR-199B (first entry) 

DE Human cysteine rich protein 61 (Cyr61). 

KW Cysteine rich protein 61; Cyr61; human; 

kw extracellular matrix signalling molecule; cell adhesion; 

KW cell migration; cell proliferation; angiogenesis; chondrogenesis; 

KW oncogenesis; haematostasis; wound healing; organ regeneration, 

OS Homo sapiens. 

PN W09733995-A2. 

PD 18-SEP-1997, 

PF 14-MAR-1997; U04193. 

PR 15-MAR-1996; US-013958 . 

PA (MUNI-) MUNIN CORP. 

PI Lau LF; ' • 

DR WPI; 97-470875/43. 

DR N-PSDB; T94699. 

PT isolated and purified cysteine rich protein 61, Cyrfil • useful to 

PT modulate e.g. haematostasis, induce wound healing, promote organ 

PT regeneration etc 

PS Claim 2; Page 112-113; 133pp; English. 

CC This protein sequence comprises human cysteine rich protein 61 

CC (Cyrfil), an extracellular matrix signalling molecule. Its amino 

CC acid sequence was deduced from a human placental cDNA clone (see 

CC T94699). Cyr61 polypeptides can be expressed in transformed or 

CC transfected host cells. Cyr61 can be used to modulate 

CC haematostasis, induce wound healing in a tissue, promote organ 

CC regeneration, improve tissue grafting or promote bone or prothesis 

CC implantation (claimed), It can also be used to screen for a 

CC modulator of angiogenesis, chondrogenesis, oncogenesis, cell 

CC adhesion, cell migration, cell proliferation, expand a population 

CC .of undifferentiated haematopoietic stem cells in culture and to 

CC screen for a mitogen (claimed) , Ex vivo methods for using 

CC mammalian extracellular matrix signalling molecules to prepare 

CC blood products are also provided. 

SQ Sequence 381 AA; 

Query Match 13.1%; Score 114; DB 26; Length 381; 

Best Local Similarity 32,3%; Pred. No. 1.63e-01; 

Matches 20; Conservative 10; Mismatches 29; Indels 3; Gaps 3; 

Db 300 yagclsvkkyrpkyc-gscvdgrcctpqltrtvkmrfrcedgetfsknvmmiqsckcnyn 358 

11:1 : I : I hi II I :: I hi II :| :| I | 

Qy 51 YASCATASKVPIMECRGGC-GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCR-A 108 

Db 359 cp 360 

I: 

Qy 109 CS 110 
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Scoring table: PAM 150 
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Result Query 
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SUMMARIES 



1 


181 20 


.7 1469 


B36665 


2 


167 19 


.1 1480 


A36665 


3 


133 15 


.2 2531 


A46019 


4 


131 15 


.0 2321 


S78549 


5 


131 15 


.0 2437 


S42612 


6 


127 14 


.5 375 


A41428 


7 


126 14 


.4 379 


A35669 


8 


126 14 


.4 5147 


IJFFTM 


9 


125 14 


.3 293 


B26637 


10 


125 14 


.3 2139 


A35672 


11 


125 14 


.3 2531 


S18188 


12 


125 14 


.3 2555 


A40043 


13 


122 14 


.0 2703 


A24420 


14 


121 13 


.9 2524 


A35844 


15 


118 13 


.5 4391 


A38096 


16 


117 13 


.4 530 


A31640 


17 


116 13 


.3 861 


A48825 


18 


115 13 


,2 1203 


A49175 


19 


115 13 


.2 1429 


S06434 


20 


115 13 


.2 2471 


A49128 


21 


114 13 


.1 728 


150719 


22 


114 13 


.1 2318 


S45306 


23 


113 12 


.9 177 


A37408 



slit protein 2 precur 
slit protein 1 precur 
gene Notch -1 protein 
notch3 protein • huma 
transmembrane protein 
CEF-10 protein precur 
gene CYR61 protein pr 
cadherin- related tumo 
neurogenic repetitive 
crumbs protein - frui 
notch protein homolog 
notch protein homolog 
notch protein - fruit 
Xotch protein - Afric 
perlecan precursor - 
epidermal growth fact 
Notch homolog Motch p 
Motch B protein • mou 
homeotic protein lin- 
cell-fate determining 
C-Delta-1 - chicken 
notch 3 protein - mou 
betacellulin precurso 



2.45e-18 
l,43e-15 
3.98e-09 
9,21e-09 
9.21e-09 
4.86e-08 
7.35e-08 
7.35e-08 
l.lle-07 
l.lle-07 
l.lle-07 
l.lle-07 
3.79e-07 
5.69e-07 
1.91e-06 
2.86e-06 
4.26e-06 
6.35e-06 
6,35e-06 
6.35e-06 
9.45e-06 
9.45e-06 
1.40e-05 



24 


110 


12.6 


612 


2 


S23174 


endothelial leukocyte 


4.56e 


05 


25 


110 


12.6 


618 


2 


B42755 


E-selectin precursor 


4.56e 


05 


26 


110 


12.6 


1955 


1 


AGCH 


agrin precursor - chi 


4.56e 


05 


27 


109 


12.5 


46 


2 


JT0747 


epiregulin - rat 


6.73e 


05 


28 


109 


12.5 


162 


2 


S68401 


epiregulin precursor 


6.73e 


05 


29 


109 


12.5 


401 


2 


S65138 


glycoprotein antigen 


6.73e 


05 


30 


109 


12.5 


427 


2 


S74211 


PAS -6/7 protein precu 


6.73e 


05 


31 


109 


12.5 


2871 


2 


A55624 


fibrillin-1 precursor 


6.73e 


05 


32 


109 


12.5 


3002 


2 


A47221 


fibrillin 1 precursor 


6.73e 


05 


33 


108 


12,4 


259 


2 


S48713 


fetal antigen 1 - hum 


9.92e 


05 


34 


108 


12.4 


260 


2 


A44549 


fetal antigen 1 homeo 


9.92e 


05 


35 


108 


12.4 


383 


2 


S53716 


homeotic protein dlk 


9.92e 


05 


36 


108 


12.4 


383 


2 


B45484 


delta-like dlk homeot 


9.92e 


05 


37 


108 


12.4 


638 


2 


S08042 


proteoglycan core pro 


9.92e 


05 


38 


107 


12.3 


385 


2 


S53718 


homeotic protein dlk 


1.46e 


04 


39 


107 


12.3 


385 


2 


A54785 


preadipocyte factor 1 


1.46e 


04 


40 


107 


12.3 


1295 


2 


A32901 


glpl protein precurso 


1.46e 


04 


41 


106 


12.1 


159 


2 


157497 


transforming growth f 


2.15e 


04 


42 


106 


12.1 


160 


1 


WFHU1 


transforming growth f 


2.15e 


04 


43 


106 


12.1 


208 


2 


A38432 


heparin-binding EGF-1 


2.15e 


04 


44 


106 


12.1 


208 


2 


A41914 


diptheria toxin recep 


2.15e 


04 


45 


106 


12.1 


706 


2 


H71707 


hypothetical protein 


2.15e 


04 



RESULT 
ENTRY 
TITLE 



1 



ORGANISM 
DATE 



ACCESSIONS 



fauthors 



♦journal 
Ititle 



B36665 Itype complete 
slit protein 2 precursor ■ fruit fly (Drosophila 

melanogaster) 
tformaljiame Drosophila melanogaster 
30-Apr-1991 fsequence_revision 30-Apr-1991 »text_change 

16-Dec-1998 
B36665 
A36665 

Rothberg, J.M.; Jacobs, J.R,; Goodman, C,S,; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
f cross -references MUID:91099665 
taccession B36665 

ttstatus preliminary 
ffmoleculejype mRNA 
ttresidues 1-1469 ft t label ROT . 
ttcross-references GB:X53959 

GENETICS 

tgene FlyBaseisli 

ttcross-references FlyBase:FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF ' 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl- terminal homology 



FEATURE 




66-91 


idomain proteoglycan amino -terminal homology #label 




PAH1\ 


101-124 


Idomain leucine-rich alpha - 2 - glycoprotein repeat 




homology i label LRR1\ 


125-148 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology 1 label lrr2\ 


149-172 


idomain leucine-rich alpha- 2 -glycoprotein repeat 




homology # label LRR3\ 


173-196 


Idomain leucine-rich alpha - 2 - glycoprotein repeat 




homology f label LRR4\ 


197-220 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology ilabel LRR5\ 


228-272 


Idomain proteoglycan carboxyl-terminal homology Ilabel 




PCS1\ 


288-313 


Idomain proteoglycan amino-terminal homology Ilabel 




PAH2\ 


323-346 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology ilabel LRR6\ 


347-370 


tdomain leucine-rich alpha-2-glycoprotein repeat 
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homology i label LRR7\ 


371 


394 


ttdomain leucine-rich alpha-2-glycoprotein repeat 
homology f label LRR8\ 


395 


418 


ttdomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LRR9\ 


419 


442 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
homology i label LR10\ 


450 


494 


tdomain proteoglycan carboxyl-terminal homology 1 label 
PCS2\ 


512 


537 


tdomain proteoglycan amino-terminal homology tlabel 
PAH3\ 


547 


571 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
homology » label LR11\ 


572 


595 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR12\ 


596 


619 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LR13\ 


620 


643 


tdomain leucine-rich alpha -2 -glycoprotein repeat 
homology tlabel LR14\ 


651 


695 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS3\ 


,708 


733 


tdomain proteoglycan amino-terminal homology tlabel 
PAH4\ 


743 


766 


tdomain leucine-rich alpha-2 -glycoprotein repeat 
homology tlabel LR15\ 


767 


790 


tdomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR16\ 


846 


890 


tdomain proteoglycan carboxyl-terminal homology tlabel 
PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



SUMMARY tlength 1469 tmolecular -weight 164695 tchecksum 8361 

Query Match 20.7*; Score 181; DB 2; Length 1469; 

Best Local Similarity 31.3%; Pred. No. 2.45e-18; 

Matches 35; Conservative 22; Mismatches 46; Indels 9; Gaps 9; 

Db 1361 KCRRGSRCVPNSNARDGYQCKCKHGQRGRYCDQAASTCRKEQVREYYT - ENDCRSRQPL- 1418 

: I : : I I I I I : I : I : I I III ::| I 
Qy 3 KCHHG-QC-HISDRGEPY-CLCQPGFSGHHCEQENP-CMGEIVREWRRQKDYASCATAS 58 

Db 1419 KYA-K-CVGGCGNQCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRKCGCTKKC 1468 

I : : I Mil II : :lll |:: :: ::: III : I 
Qy 59 KVPIMECRGGCGTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGC-RAC 109 



RESULT 2 
ENTRY 
TITLE 



ACCESSIONS 
REFERENCE 
tauthors 

tjournal 
ttitle 



A36665 ttype complete 
slit protein 1 precursor - fruit fly (Drosophila 
melanogaster) 

•ANISM tformaljiame Drosophila melanogaster 

E 30-Apr-1991 tsequence_revision 30-Apr-1991 ttext_change 

24-Sep-1998 
A36665; S13523 
A36665 

Rothberg , J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S, 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains . 
tcross -references MUID: 91099665 
taccession A36665 

ttstatus preliminary 
ttmolecule_type mRNA 
ttresidues 1-1480 ttlabel ROT 
ttcross-references GB:X53959; NID:g8614; PID:g8615 
GENETICS 

tgene FlyBase:sli 

tfcross-references FlyBase:FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha - 2 -glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 
KEYWORDS alternative splicing 



66-91 


tdomain proteoglycan amino-terminal homology tlabel 




PAH1\ 


101-124 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRRl\ 


125-148 


tdomain leucine-rich alpha - 2 -glycoprotein repeat 




homology tlabel LRR2\ 


149-172 


tdomain leucine-rich alpha-2*glycoprotein repeat 




homology tlabel LRR3\ 


173-196 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR4\ 


197-220 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


228-272 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS1\ 


288-313 


tdomain proteoglycan amino-terminal homology tlabel 




PAH2\ 


323-346 


tdomain leucine-rich alpha - 2 -glycoprotein repeat 




homology tlabel LRR6\ 


347-370 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR7\ 


371-394 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR8\ 


395-418 


tdomain leucine-rich alpha- 2 "glycoprotein repeat 




homology tlabel LRR9\ 


419-442 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR10\ 


450-494 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS2\ 


512-537 


tdomain proteoglycan amino-terminal homology tlabel 




PAH3\ 


547-571 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 




homology tlabel LR11\ 


572-595 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR12\ 


596-619 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LR13\ 


620-643 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR14\ 


651-695 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS3\ 


708-733 


tdomain proteoglycan amino-terminal homology tlabel 




PAH4\ 


743-766 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR15\ 


767-790 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR16\ 


791-814 


tdomain leucine-rich alpha - 2 - g 1 ycoprotein repeat 




homology tlabel LR17\ 


815-838 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LR18\ 


846-890 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



SUMMARY tlength 1480 tmolecular-weight 165751 tchecksum 900 

Query Match 19.1%; Score 167; DB 2; Length 1480; 

Best Local Similarity 29.4%; Pred. No. 1.43e-15; 

Matches 35; Conservative 25; Mismatches 47; Indels 12; Gaps 



12; 



Db 1361 KCRRGSRCVPNSNARDGYQCKCKHGQRGRYCDQGEGSTEPPTVTAASTCRKEQVREYYTE 1420 

I:: :| I : : I I I I I : I : I I : II |: I ::| : 
Qy 3 KCHHG-QC-HISDRGEPY-CLCQPGFSGHHCEQ-ENPCMGEIVREA-I-RR-Q-KDY-AS 53 

Db 1421 NDCRSRQPLKYAKCVGGCGNQCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRRCGCTKKC 1479 

I: I" I llll II : :lll I" :: ::: III : I 
Qy 54 CATASCTPIM-E-CRGGCGTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGC-RAC 109 



RESULT 3 
ENTRY 
TITLE 
ORGANISM 



A46019 ttype complete 

gene Notch- 1 protein - mouse 

tformaljiame Mus musculus tcommon_name house mouse 
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ACCESSIONS 
REFERENCE 
iauthors 

tjournal 
ttitle 



22-Sep-1993 tsequence revision 18-Nov-1994 itext change 

14-Aug-1998 
A46019 
A46019 

del Amo, F.F.; Gendron-Maguire, M. ; Swiatek, P,J,; Jenkins, 

N.A.; Copeland, N.G.; Gridley, T. 
Genomics (1993) 15:259*264 

Cloning, analysis, and chromosomal localization of Notch-1, a 
mouse homolog of Drosophila Notch, 
tcross- references MUID: 93194170 
♦accession A46019 

f#status preliminary; not compared with conceptual translation 
itmolecule.type nucleic acid 
firesidues 1-2531 iilabel DEL 

ttcross-references GB:Z11886; GB:S47228; NID:g288502; PID:g288503 
ttnote sequence extracted from NCBI backbone (NCBIP: 12731B) 

iSSIFICATION fsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

iTURE 

757-788 tdomain EGF homology ilabel EGF\ 

1917-1948 tdomain ankyrin repeat homology f label AN1\ 

1949-1981 tdomain ankyrin repeat homology tlabel AN2\ 

1983-2015 tdomain ankyrin repeat homology tlabel AN3\ 

2016-2048 tdomain ankyrin repeat homology tlabel AN4\ 

2049-2081 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2531 tmolecular -weight 271312 tchecksum 6611 

Query Match 15.24; Score 133; DB 2; Length 2531; 

Best Local Similarity 25.41; Pred. No. 3.98e-09; 

Matches 29; Conservative 29; Mismatches 47; Indels 9; Gaps 9; 

Db 575 CHYGSCK-DGVATFTCLCQPGYTGHHCETNINECHSQPCRHGGTCQDRDNSYLCLCLKGT 633 

II I I : lllll|::||IM : I I :: I : : |:: I I :: 
Qy 4 CHHGQCHISDRGEPYCLCQPGFSGHHCEQE-NPCMGEIVREA-I-R-RQKDYAS-CATAS 58 

Db 634 TGPNCEINLDDCASNPCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPC 687 

I I |:: I:: : ::|: ::: ; ; ;; ||; :| 
Qy 59 KVPIMECR-GGCGTTCCQPIRSKRRKYVFQCTDGSSFVEEV-ERHL-ECGCRAC 109 



RESULT 4 
ENTRY 
TITLE 
ORGANISM 
BATE 



_ CESSIONS 

REFERENCE 
tauthors 
tsubmission 
taccession 



S78549 ttype complete 
notch3 protein - human 
tformaljiame Homo sapiens tcommonjiame man 
.JAT E 24 -Jul-1998 Isequencejrevision 24-Jul-1998 ttext change 

B 17-Mar-1999 

^Cessions S78549; S71825 

S78549 

Joutel, A.; Tournier-Lasserve, E, 
submitted to the EMBL Data Library, April 1997 
S78549 
ttmolecule.type mRNA 
ttresidues 1-2321 ttlabel JOU1 
ttcross-references EMBL:U97669; NID:g2668591; PID:g2668592 
S71825 

Joutel, A.; Corpechot, C; Ducros, A,; Vahedi, K,; Chabriat, 
H.; Mouton, P.; Alamowitch, S.; Domenga, V.; Cecillion, M.; 
Marechal, E.; Maciazek, J.; Vayssiere, C; Cruaud, C; 
Cabanis, E.A.; Ruchoux, M.M.; Weissenbach, J.; Bach, J.F.; 
Bousser, M.G.; Tournier-Lasserve, E. 
Nature (1996) 383:707-710 

Notch3 mutations in CADASIL, a hereditary adult-onset 
condition causing stroke and dementia, 
tcross -references MUID: 97032728 
taccession S71825 
fSstatus nucleic acid sequence not shown 
ttmolecule.type DNA 

ttresidues 67-113,-138-194; 268-333, 'G' , 335-346 ; 536 -613 ; 716 -765 ; 

1240-1279; 1815-1888 ttlabel JOU2 
ttcross-references EMBL:U97669 
GENETICS 

Igene notch3 
fmap_position 19pl3.1 



tauthors 



tjournal 
ttitle 



FUNCTION 

' tdescription may be involved in pathogenesis of CADASIL, causing a type of 

stroke and dementia 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 

repeat homology; EGF homology 

KEYWORDS tandem repeat; transmembrane protein 
FEATURE 

318-349 tdomain EGF homology tlabel EGF\ 

1838-1870 tdomain ankyrin repeat homology tlabel AN1\ 

1871-1903 tdomain ankyrin repeat homology tlabel AN2\ 

1905-1937 tdomain ankyrin repeat homology tlabel AN3\ 

1938-1970 tdomain ankyrin repeat homology tlabel AN4\ 

1971-2003 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2321 tmolecular-weight 243657 tchecksum 3337 



Query Match 15.0%; Score 131; DB 2; Length 2321; 

Best Local Similarity 42.1%; Pred. No. 9.21e-09; 

Matches 16; Conservative 10; Mismatches 9; Indels 3; Gaps 3; 

Db 1055 CQAGGQC-VDEDSSHYCVCPEGRTGSHCEQEVDPCLAQ 1091 

1:111::: Ihl I :l Ml :||::: 
Qy 4 CHHG -QC HISDRG EPYCLCQPG FSGHHCEQE - NPCMGE 39 



5 



ENTRY 
TITLE 
ORGANISM 
DATE 



ACCESSIONS 



tauthors 
tjournal 
ttitle 



S42612 ttype complete 
transmembrane protein precursor - zebra fish 
tformaljiame Brachydanio rerio tcommonjiame zebra fish 
20-Feb-1995 tsequence_revision 20-Feb-1995 ttext change 

lO-Jul-1998 
S42612 
S42612 

Bierkamp, C; Campos -Ortega, j,a, 
Mech. Dev. (1993) 43:87-100 

A zebrafish homologue of the Drosophila neurogenic gene Notch 
and its pattern of transcription during early 
embryogenesis . 
tcross -references MUID: 94128602 
taccession S42612 

ttstatus preliminary 
ttmolecule.type mRNA 
ttresidues 1-2437 ttlabel BIE 
ttcross-references EMBL:X69088; NlD:g433866; PID:g433867 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 



FEATURE 
1915-1947 
1948-1980 
1982-2014 
2015-2047 
2048-2080 

SUMMARY 



tdomain ankyrin repeat homology tlabel ANl\ 

tdomain ankyrin repeat homology tlabel AN2\ 

tdomain ankyrin repeat homology tlabel AN3\ 

tdomain ankyrin repeat homology tlabel AN4\ 

tdomain ankyrin repeat homology tlabel AN5 
tlength 2437 tmolecular-weight 262306 tchecksum 4021 



Query Match 15.0%; 
Best Local Similarity 41.0%; 
Matches 16; Conservative 



Score 131; DB 2; Length 2437; 

Pred. No. 9.21e-09; 

12; Mismatches 9; Indels : 



Gaps 2; 



1352 SLRCRNGATCVSGHLSPRCLCAPGFSGHECQTRMDSPCL 1390 
:::!::! :| : I III INI |: : ::||: 
1 AFKCHHGQCHISDRGEPYCLCQPGFSGHHCE-Q-ENPCM 37 



RESULT 6 

ENTRY A41428 ttype complete 

TITLE CEF-10 protein precursor - chicken 

ORGANISM tformaljiame Gallus gallus fcommonjiame chicken 

DATE 03-Apr-1992 fsequence_revision 03-Apr-1992 ttext change 

10-Sep-1997 

ACCESSIONS A41428 

REFERENCE A41428 

tauthors Simmons, D.L.; Levy, D.B.; Yannoni, Y.; Erikson, R.L. 

tjournal Proc. Natl. Acad. Sci. U.S.A. (1989) 86:1178-1182 

ttitle Identification of a phorbol ester-repressible v-src- inducible 
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gene. 

icross -references MUID: 89145206 
taccession A41428 

t tstatus preliminary 

ttmolecule_type mRNA 

ttresidues 1-375 ttlabel SIM 

itcross-references GB: J04496; NID:g211435; PID:g211436 
SUMMARY tlength 375 tmolecular-weight 40651 tchecksum 1417 

Query Match 14.5%; Score 127; DB 2; Length 375; 

Best Local Similarity 32.3%; Pred. No, 4,86e-08; 

Matches 20; Conservative 11; Mismatches 28; Indels 3; Gaps 3; 

Db 295 YAGCSSVKKYRPKYC-GSCVDGRCCTPQQTRTVKIRFRCDDGETFTKSVMMIQSCRCNYN 353 

IN:: I : I |:| II I ::: I |:| || :| I I | 
Qy 51 YASCATASKVPIMECRGGC-GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCR-A 108 

Db 354 CP 355 
I: 

Qy 109 CS 110 

ttJLT 7 
RY A35669 ttype complete 

TITLE gene CYR61 protein precursor - mouse 

organism iformaljiame Mus musculus tcommonjame house mouse 
DATE 28-Sep-1990 tsequencejrevision 18-Nov-1992 ttext change 

16-Dec-1998 
ACCESSIONS A35669; 148319; S16446 
REFERENCE A35669 

tauthors O'Brien, T.P.; Yang, G.P.; Sanders, L.; Lau, L,F. 

tjournal Mol. Cell. Biol. (1990) 10:3569-3577 

ttitle Expression of cyr61, a growth factor- inducible 

immediate-early gene, 
Icross -references MUID: 90287146 
taccession A35669 

t tstatus preliminary 
#§molecule_type mRNA 
tltresidues 1-379 fttlabel OAB 
Itcross-references GB:M32490; NID:gl92909; PID:g309206 
tlnote the authors translated the codon GAT for residue 337 as 

Gin 

REFERENCE 148319 

tauthors Latinkic, B.V.; O'Brien, T.P.; Lau, L,F. 
tjournal Nucleic Acids Res. (1991) 19:3261-3267 
ttitle Promoter function and structure of the growth 
factor-inducible immediate early gene cyr61. 
tcross-references MUID: 91288203 
taccession 148319 

•ttstatus translated from GB/EMBL/DDBJ 
ttmolecule_type DNA 
ttresidues 1-379 Mabel RES 
ttcross-references EMBL:X56790; NID:g50632; PID:g50633 
tfnote the authors did not translate the codon for residue 108 

ttnote the authors translated the codon GAT for residue 337 as 

Gin 

GENETICS 

tgene CYR61 

tintrons 21/3; 93/1; 208/1; 279/3 
CLASSIFICATION tsuperfamily von Willebrand factor type C repeat homology 
FEATURE 

99-166 tdomain von Willebrand factor type C repeat homology 

tlabel VWC 

SUMMARY tlength 379 tmolecular-weight 41709 tchecksum 3726 

Query Match 14,4%; Score 126; DB 2; Length 379; 

Best Local Similarity 32.3%; Pred. No. 7.35e-08; 

Matches 20; Conservative 12; Mismatches 27; Indels 3; Gaps 3; 

Db 298 YAGCSSVKKYRPKYC-GSCVDGRCCTPLQTRTVKMRFRCEDGEMFSKNVMMIQSCKCNYN 356 

Ihh: I : I |:l II |:::: I |:| II I :| II 
Qy 51 YASCATASKVPIMECRGGC-GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCR-A 108 



Db 357 CP 358 

I: 

Qy 109 CS 110 



RESULT 
ENTRY 
TITLE 



IJFFTM ttype complete 

cadherin-related tumor suppressor precursor • fruit fly 
(Drosophila melanogaster) 
ORGANISM tformaljiame Drosophila melanogaster 

DATE 30-Sep-1993 tsequence_revision 30-Sep-1993 ttext change 

16-Feb-1997 
ACCESSIONS A41087; B41087 
REFERENCE A41087 
•tauthors Mahoney, P. A,; Weber, U.; Onofrechuk, p.; Biessmann, H,; 

Bryant, P.J.; Goodman, CS, 
tjournal Cell (1991) 67:853-868 

ttitle The fat tumor suppressor gene in Drosophila encodes a novel 

member of the.cadherin gene superfamily. 
tcross-references MUID: 92069752 
taccession A41087 
ttmolecule type mRNA 

ttresidues 143-485; 1279-5147 ttlabel MAH 
ttcross-references GB:M80537 
taccession ' B41087 
ttmolecule_type DNA 

ttresidues 1-142,-487-1278 ttlabel MA2 
ttcross-references GB:M80537 
ttnote 1229-Gly and 1233-Ser were also found 

GENETICS 



tgene 


fat 






ttcross-r 


iferences FlyBase;FBgn0001075 






CLASSIFICATION 


tsuperfamily cadherin-related tumor su 
repeat homology; EGF homology 


pressc 


r; cadherin 


KEYWORDS 


calcium binding; cell adhesion; duplication; 


transmembrane 




protein 






FEATURE 








1-35 


tdomain signal sequence tstatus predicted tlabel SIG\ 


36-5147 


tproduct cadherin-related tumor suppressor tstatus 




predicted tlabel MAT\ 






36-4583 


tdomain extracellular tstatus pre 


icted 


tlabel EXT\ 


51-156 


tdomain cadherin repeat homology 


label 


CR1\ 


159-270 


tdomain cadherin repeat homology 


label 


CR2\ 


271-382 


tdomain cadherin repeat homology 


label 


CR3\ 


390-494 


tdomain cadherin repeat homology 


label 


CR4\ 


497-599 


tdomain cadherin repeat homology 


label 


CR5\ 


602-708 


tdomain cadherin repeat homology 


label 


CR6\ 


718-822 


tdomain cadherin repeat homology 


label 


CR7\ 


831-942 


tdomain cadherin repeat homology 


label 


CR8\ 


948-1049 


tdomain cadherin repeat homology 


label 


CR9\ 


1052-1153 


tdomain cadherin repeat homology 


label 


C10\ 


1156-1278 


tdomain cadherin repeat homology 


label 


Cll\ 


1281-1384 


tdomain cadherin repeat homology 


label 


C12\ 


1387-1489 


tdomain cadherin repeat homology 


label 


C13\ 


1492-1601 


tdomain cadherin repeat homology 


label 


C14\ 


1607-1713 


tdomain cadherin repeat homology 


label 


C15\ 


1717-1823 


tdomain cadherin repeat homology 


label 


C16\ 


1826-1922 


tdomain cadherin repeat homology 


label 


C17\ 


1925-2027 


tdomain cadherin repeat homology 


label 


C18\ 


2028-2167 


tdomain cadherin repeat homology 


label 


C19\ 


2169-2278 


tdomain cadherin repeat homology 


label 


C20\ 


2281-2384 


tdomain cadherin repeat homology 


label 


C99\ 


2387-2491 


tdomain cadherin repeat homology 


label 


C21\ 


2494-2596 


tdomain cadherin repeat homology 


label 


C22\ 


2599-2703 


tdomain cadherin repeat homology 


label 


C23\ 


2707-2810 


tdomain cadherin repeat homology 


label 


C24\ 


2813-2913 


tdomain cadherin repeat homology 


label 


C25\ 


2915-3013 


tdomain cadherin repeat homology 


label 


C26\ 


3014-3124 


tdomain cadherin repeat homology 


label 


C27\ 


3127-3229 


tdomain cadherin repeat homology 


label 


C28\ 


3232-3334 


tdomain cadherin repeat homology 


label 


C29\ 


3337-3439 


tdomain cadherin repeat homology 


label 


C30\ 


3442-3545 


tdomain cadherin repeat homology 


Habel 


C31\ 
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3548-3651 tdomain cadherin repeat homology tlabel C32\ 

3654-3756 tdomain cadherin repeat homology tlabel C33\ 

3954-4010 f domain EGF homology tlabel EG1\ 

4017-4048 tdomain EGF homology tlabel EG2\ 

4056-4089 tdomain EGF homology tlabel EG3\ 

4096-4127 tdomain EGF homology tlabel EG4\ 

4584-4609 tdomain transmembrane tstatus predicted tlabel TMM\ 

4610-5147 tdomain intracellular tstatus predicted tlabel INT 
JMMARY tlength 5147 tmolecular -weight 564895 tchecksum 6994 



Query Match 14.4%; 
Best Local Similarity 44.4%; 
Matches 16; Conservative 



Score 126; DB 1; Length 5147 
Pred. No. 7.35e-08; 
9; Mismatches 



Indels 3; Gaps 3; 



Db 4061 CRNGGSCQRSPDGSSYFCLCRPGFRGNQCESVSDSC 4096 
I::! I: I hi MM |::|l ::| 
4 CHHG-QCHISDRGEPY-CLCQPGFSGHHCEQ-ENPC 36 



RESULT 
ENTRY 
TITLE 



ORGANISM 
DATE 



ACCESSIONS 



♦authors 



tjournal 
ttitle 



) ttext.change 



; Weigel, 



B26637 ttype fragment 

neurogenic repetitive locus 95F protein - fruit fly 

(Drosophila melanogaster) (fragment) 
tformaljiame Drosophila melanogaster 
16-Aug-1988 tsequence.revision 16-Aug-lS 

14-Aug-1998 
B26637 
A91081 

Knust, E,; Dietrich, 0,; Tepass, O.; Bremer, K.A.; 

D.; Vaessin, H,; Campos -Ortega, J. A. 
EMBO J. (1987) 6:761-766 

EGF homologous sequences encoded in the genome of Drosophila 
melanogaster, and their relation to neurogenic genes, 
tcross -references MOID: 87218537 
taccession B26637 
ttmolecule.type mRNA 
ttresidues 1-293 ttlabel KNU 
ttcross-references GB:X05144; NID:g7519; PID:g929536 
GENETICS 

tgene FlyBase :crb 

t tcross -references FlyBase : FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 



KEYWORDS 
FEATURE 

•216-252 
MARY 



transmembrane protein 



tdomain EGF homology tlabel EGF 
tlength 293 tchecksum 3413 



Query Match 14.3%; 
Best Local Similarity 42.1%; 
Matches 16; Conservative 



Score 125; DB 2; Length 293; 

Pred. No. 1. lle-07; 

10; Mismatches 9; Indels 3; 



Db 144 CLNNGTC - INQVAAFFCQCQPGFEGQHCEQNIDECADQ 180 

I ::| I I:: : :| Mill Mill: : I : 
Qy 4 C - HHGQCHI SDRGEP YCLCQPGFSGHHCEQE - NPCMGE 39 



RESULT 
ENTRY 
TITLE 
ORGANISM 
DATE 



10 



A35672 ttype complete 

crumbs protein - fruit fly (Drosophila melanogaster) 
tformaljiame Drosophila melanogaster 
21-Sep-1990 tsequence_revision 18-Nov-1992 ttext_change 
14-Aug-1998 
ACCESSIONS A35672 
REFERENCE A35672 

tauthors Tepass, U,; Theres, C; Knust, E, 
tjournal Cell (1990) 61:787-799 

ttitle crumbs encodes an EGF-like protein expressed on apical 

membranes of Drosophila epithelial cells and required for 
organization of epithelia. 
tcross -references MUID: 90263104 
taccession A35672 

tistatus preliminary 
ttmoleculejype mRNA 



ttresidues 1-2139 ttlabel TEP 
t tcross -references GB:M33753 

ttnote the authors translated the codon GGC for residue 1928 a 

Cys, and TAT for residue 2023 as Gin 

GENETICS 

tgene FlyBase :crb 

t tcross -references FlyBase : FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 
transmembrane protein 



KEYWORDS 
FEATURE 

691-722 
SUMMARY 



tdomain EGF homology tlabel EGF 
tlength 2139 tmolecular -weight 233619 tchecksum 7230 



Query Match 14.3%; 
Best Local Similarity 42.1%; 
16; Conservative 



Score 125; DB 2; Length 2139; 

Pred. No. 1. lle-07; 

10; Mismatches 9; Indels 3; 



Db 1806 CLNNGTC-INQVAAFFCQCQPGFEGQHCEQNIDECADQ 1842 

I -I I I :: : :| Mill MM: : I : 
Qy 4 C - HHGQCHI SDRGEPYCLCQPGFSGHHCEQE - NPCMGE 39 



RESULT 1 

ENTRY 

TITLE 

ORGANISM 

DATE 



ACCESSIONS 
REFERENCE 

tauthors 
tjournal 
ttitle 



S18188 ttype complete 
notch protein homolog - rat 

tformaljiame Rattus norvegicus » common jiame Norway rat 
19-Feb-1994 tsequence revision lO-Nov-1995 ttext change 
12-Feb-1999 



Weinmaster, G.; Roberts, V.J.; Lemke, G. 
Development (1991) 113:199-205 
A homolog of Drosophila Notch expressed during mammalian 
development, 
tcross-references MUID: 92111383 
taccession S18188 
ttmolecule.type mRNA 
ttresidues 1-2531 ttlabel WEI 
t tcross -references EMBL:X57405; NID:g57634; PID;g57635 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 



FEATURE 
1917-1949 
1950-1982 
1984-2016 
2017-2049 
' 2050-2082 

SUMMARY 



tdomain ankyrin repeat homology tlabel AN1\ 

tdomain ankyrin repeat homology tlabel AN2\ 

tdomain ankyrin repeat homology tlabel AN3\ 

tdomain ankyrin repeat homology tlabel AN4\ 

tdomain ankyrin repeat homology tlabel AN5 

tlength 2531 tmolecular -weight 270907 tchecksum 2705 



Query Match 14,3%; Score 125; DB 2; Length 2531; 

Best Local Similarity 25.4%; Pred. No. 1. lle-07; 

Matches 30; Conservative 31; Mismatches 45; Indels 12; Gaps 13 

Db 570 CDPDPCHIGLCKDGVATFTCLCQPGYTGHHCETNINECHSQPCRHGGTCQDRDNYYLCLC 629 

I MM : I : MINIMUM MM: : :::| | I 
Qy 4 CHHGQCHISD-R-G-EPY-CLCQPGFSGHHCEQE-NPCMGEIVREA-IRRQKD-YAS-C 54 

Db 630 LKGTTGPNCEINLDDCASNPCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPC 687 

:: I I I:: |:: : ::|: ::: : : :: I M :| 
Qy 55 ATASKVPIMECR -GGCGTTCCQPIRSKRRKYVFQCTDGSSFVEEV- ERHL- ECGCRAC 109 



RESULT 1 

ENTRY 

TITLE 

ORGANISM 

DATE 



tjournal 



A40O43 ttype complete 
notch protein homolog TAN-1 precursor - human 
tformaljiame Homo sapiens tcommonjiame man 
21-Apr-1992 tsequencejrevision 21-Apr-1992 ttext.change 

14-Aug-1998 
A40043 
A40043 

Ellisen, L.W.; Bird, J,; West, D.C.; Soreng, A.L.; 

T.C.; Smith, S.D.; Sklar, J. 
Cell (1991) 66:649-661 
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ttitle TAN-1, the human homolog of the Drosophila Notch gene, is 
broken by chromosomal translocations in T lymphoblastic 
neoplasms . 
» cross -references M0ID;91347367 
« access ion A40043 

ttstatus preliminary; nucleic acid sequence not shown; not 

compared with conceptual translation 
ttmolecule.type tnRNA 
tiresidues 1-2555 Mabel ELL 
ttcross-references GB:M73980 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

1149-1180 tdomain EGF homology flabel EGF\ 

1927-1959 tdomain ankyrin repeat homology tlabel AN1\ 

1960-1992 tdomain ankyrin repeat homology tlabel AN2\ 

1994-2026 tdomain ankyrin repeat homology tlabel AN3\ 

2027-2059 tdomain ankyrin repeat homology tlabel AN4\ 

2060-2092 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY t length 2555 tmolecular -weight 272337 tchecksum 463 

tuery Match 14.3%; Score 125; DB 2; Length 2555; 

est Local Similarity 23.7%; Pred. No. l.lle-07; 

atches 27; Conservative 32; Mismatches 46; Indels 9; 

Db 



8; 



574 CHYGSCK-DGVATFTCLCRPGYTGHHCETNINECSSQPCRLRGTCQDPDNAYLCFCLKGT 632 
II I I : ll:ll::IMI : I I :: :| : : :: | :: 
Oy 4 CHHGOCHISDRGEPYCLCQPGFSGHHCEQE- NPCMGE - - IVREAIR - RQKDYAS -CATAS 58 

Db 633 TGPNCEINLDDCASSPCDSGTCLDKIDGYECACEPGYTGSMCNSNIDECAGNPC 686 

I I I::: |:: : ::|: ::: : : :: ||: :| 
Qy 59 KVPIMECR - GGCGTTCCQP IRSKRRKYVFQCTDGSSFVEEV - ERHL • ECGCRAC 109 



RESULT 13 

ENTRY 

TITLE 



A24420 ttype complete 

notch protein - fruit fly (Drosophila melanogaster) 



ALTERNATEJAMES neurogenic repetitive locus protein 
ORGANISM tformaljiame Drosophila melanogaster 

DATE 30-Jun-1987 tsequence revision 30-Jun-1987 ttext change 

07-Aug-1998 

ACCESSIONS A24420; A24768; S09358; A05267 
REFERENCE A24420 

tauthors Kidd, S.; Kelley, M.R.; Young, M.W. 
tjournal Mol. Cell. Biol. (1986) 6:3094-3108 
tcross-references MUID : 87064624 
taccession A24420 
ttmolecule.type DNA 
tfresidues 1-2703 ttlabel KID 
t tcross-references GB:K03508; NlD;gl57991; PlD:gl57993 
IRENCE A24768 

tauthors Wharton, K.A, ; Johansen, K.M.; Xu, T.; Artavanis-Tsakonas, S. 
tjournal Cell (1985) 43:567-581 
tcross-references MUID: 86079539 
taccession A24768 
ttmolecule.type mRNA 

ttresidues 1-48, T, 50-118, 'R' ,120-230, 'I' ,232-256, 'N' ,258-266, 'A', 
268-872, 'R' ,874-958, 'R' ,960-1970, 'FH' ,1973*2256, 'G', 
2258-2264, 'V ,2266-2406, 'R' ,2408-2444, 'L' ,2446-2703 
ttlabel WHA1 

tfnote the authors translated the codon ATC for residue 49 as 

Thr, ATT for residue 2044 as Arg, GTA for residue 2265 
as Ala, CGC for residue 2407 as His, and CTT for 
residue 2445 as Arg 

REFERENCE S09358 

tauthors Tautz, D. 

tjournal Nucleic Acids Res. (1989) 17:6463-6471 

ttitle Hypervariability of simple sequences as a general source for 

polymorphic DNA markers , 
tcross-references MUID: 89385974 
taccession S09358 
ftmolecule.type DNA 

tfresidues 2505-2551, 'QQQQ' ,2552-2576, 'E' ,2578-2604 ttlabel TAU 



REFERENCE 
tauthors 



tjournal 
ttitle 



A05267 

Wharton, K.A, ; Yedvobnick, B.; Finnerty, V.G.; 

Artavanis-Tsakonas, S. 
Cell (1985) 40:55-62 

opa: a novel family of transcribed repeats shared by the 
Notch locus and other developmentally regulated loci in D. 
melanogaster. 
tcross-references MOID : 85099329 
taccession A05267 
##molecule_type DNA 

ttresidues 2504-2576, 'E' , 2578-2611 ttlabel WHA2 
GENETICS 

tgene notch; opa 

t tcross -references FlyBase : FBgn0004647 
tmap_position 8.96-9.36 

tintrons 53/3; 84/3; 171/3; 240/3; 283/3; 2333/3; 2436/3; 2588/3 
CLASSIFICATION tsuperfamily notch protein; ankyrin repeat homology; EGF 
homology 

KEYWORDS differentiation; tandem repeat; transmembrane protein 

FEATURE 

27-43 tdomain transmembrane tstatus predicted tlabel TMM1\ 

568-599 tdomain EGF homology tlabel EGF\ 

1746-1762 tdomain transmembrane tstatus predicted tlabel TMM2\ 

1950-1982 tdomain ankyrin repeat homology tlabel AN1\ 

1983-2015 tdomain ankyrin repeat homology tlabel AN2\ 

1988-2004 tdomain transmembrane tstatus predicted tlabel TMM3\ 

2017-2049 tdomain ankyrin repeat homology tlabel AN3\ 

2050-2082 tdomain ankyrin repeat homology flabel AN4\ 

2083-2115 tdomain ankyrin repeat homology tlabel AN5\ 

2538-2568 fregion glutamine-rich\ 

2538-2568 tdomain neurogenic repetitive element tstatus predicted 

flabel OPA 

SUMMARY tlength 2703 tmolecular -weight 288876 tchecksum 6404 

Query Match 14,0%; Score 122; DB 2; Length 2703; 

Best Local Similarity 40,5%; Pred, No, 3.79e-07; 

Matches 17; Conservative 9; Mismatches 11; Indels 5; Gaps 5; 

Db 952 SFPCQNGGTC - LDG IGD - YSCLCVDGFDGKHCETDINECLSQ 991 

: |::| I : |: I III II I III : I |::; 
Qy 1 AFKCHHG-QCHISDRGEPY-CLCQPGFSGHHCEQE-NPCMGE 39 



RESULT 14 

ENTRY 

TITLE 

ORGANISM 

DATE 

ACCESSIONS 



tauthors 
tjournal 
ttitle 
tcross ■ 
taccession 
ttstatus 

ttmolecule. 
ttresidues 

CLASSIFICATION 

KEYWORDS 
FEATURE 

222-254 

1924-1956 

1957-1989 

1991-2023 

2024-2056 

2057-2089 
SUMMARY 

Query Match 



A35844 ttype complete 

xotch protein - African clawed frog 

tformaljiame Xenopus laevis fcommonjiame African clawed frog 

12-Oct-1990 tsequence_revision 12-Oct-1990 ttext change 

14-Aug-1998 
A35844 
A35844 

Coffman, C; Harris, W,; Kintner, C. 

Science (1990) 249:1438-1441 

Xotch, the Xenopus homolog of Drosophila notch. 

MUID:90385285 
A35844 

preliminary; nucleic acid sequence not shown; not 
compared with conceptual translation 
.type mRNA 

1-2524 ttlabel COF 
tsuperfamily unassigned ankyrin repeat proteins; ankyrin 

repeat homology; EGF homology 
transmembrane protein 



tdomain EGF homology flabel EGF\ 

tdomain ankyrin repeat homology tlabel AN1\ 

tdomain ankyrin repeat homology tlabel AN2\ 

tdomain ankyrin repeat homology tlabel AN3\ 

tdomain ankyrin repeat homology flabel AN4\ 

tdomain ankyrin repeat homology tlabel AN5 

tlength 2524 fmolecular-weight 274931 tchecksum 9441 

13.91; Score 121; DB 2; Length 2524; 
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Best Local Similarity 45.0*; Pred. No. 5,69e-07; 

Matches 18; Conservative 8; Mismatches 9; Indels 5; Gaps 5; 

Db 1237 KCFNNGKC • I -DRVGGYNCICPPGFVGERCEGDVNECLSN 1274 

II ::l I I II I 1:1 III I :|| : I |::: 
Qy 3 KC - HHGQCHISDRGEPY-CLCOPGFSGHHCEQE- NPCMGE 39 



RESULT 15 

ENTRY A38Q96 ttype complete 

TITLE perlecan precursor - human 

ALTERNATE_NAMES basement membrane heparan sulfate proteoglycan; heparan 

sulfate proteoglycan 2 
ORGANISM t forma ljiame Homo sapiens #common_name man 

DATE 07-Apr*1994 tsequencejrevision 07-Apr-1994 ttext change 

15-Jan-1999 

ACCESSIONS A38096; S19256; S77946; A41059; A40306; B33625; A33625; 
M A41736 
^TERENCE A38096 

tauthors Murdoch, A.D.; Dodge, G.R.; Cohen, I.; Tuan, R.S,; lozzo, 
R.V. 

tjournal J. Biol. Chem. (1992) 267:8544-8557 
ttitle Primary structure of the human heparan sulfate proteoglycan 
from basement membrane (HSPG2/perlecan) . A chimeric 
molecule with multiple domains homologous -to the low 
density lipoprotein receptor, laminin, neural cell adhesion 
molecules, and epidermal growth factor, 
itcross -references MUID: 92235084 
taccession A38096 
ttmolecule.type mRNA 
ttresidues 1-4391 ttlabel MUR 
ficross- references GB:M85289; NID:gl84426; PID:gl84427 
REFERENCE A41736 

tauthors Kallunki, P.; Tryggvason, K. 
•journal J. Cell Biol. (1992) 116:559-571 
ttitle Human basement membrane heparan sulfate proteoglycan core 
protein: a 467-kD protein containing multiple domains 
resembling elements of the low density lipoprotein 
receptor, laminin, neural cell adhesion molecules, and 
epidermal growth factor, 
tcross-references MUID: 92112994 
Itaccession S19256 
(Itmolecule.type mRNA 

ttresidues 1-57, 'D' ,59-434, 'A' ,436, 'FL' ,438-449, 'Q', 451-502, 'A' , 

• 503-792, T ,794-908, 'R' ,910-1702, 'RG' ,1705-1752, 'R', 
1754-2037, 'I', 2039-2049, 'Q', 2050-2051,2053-2092, 'H', 
2094-2626, 'R', 2628-2769, 'Y', 2771-2979, 'H' ,2981-2994, 
'G', 2996-3167, 'T', 3169-3240, 'R', 3242-3426, 'R', 
3428-3631, 'Q', 3633-3967, 'S',3969-4003, 'T',4005-4134, 
'I', 4136-4331, 'I', 4333-4391 ttlabel KAL 
ttcross-references EMBL:X62515 

REFERENCE S77946 
tauthors Tryggvason, K. 

^submission submitted to the EMBL Data Library, October 1991 
Itaccession S77946 
timoleculejype mRNA 

ttresidues 1-57, 'D', 59-434, 'A', 436, 'FL', 438-449, 'Q', 451-502, 'A', 
503-792, X ,794-908, 'R' ,910-1702, 'HG' ,1705-1752, 'R', 
1754-2037, 'I', 2039-2049, 'Q' ,2050-2051,2053-2092, 'H', 
2094-2626, 'R', 2628-2769, 'Y', 2771-2979, 'H', 2981-2994, 
'G' ,2996-3167, 'T' ,3169-3240, 'R' ,3242-3426, 'R' , 
3428-3631,'Q', 3633-4003, 'T',4005-4134, 'I', 4136-4331, 
'I', 4333-4391 ttlabel TRY 
ttcross-references EMBL:X62515; NlD;g29469; PID:g29470 
REFERENCE A41059 

tauthors Kallunki, P.; Eddy, R.L.; Byers, M.G.; Kestilae, M.; Shows, 

T.B.; Tryggvason, K. 
tjournal Genomics (1991) 11:389-396 

ttitle Cloning of human heparan sulfate proteoglycan core protein, 
assignment of the gene (HSPG2) to Ip36.1->p35 and 
identification of a BamHI restriction fragment length 
polporphism. 

tcross-references MUID; 92120660 



taccession A41059 
ttmolecule_type mRNA 

ttresidues 'RT\ 892-908, 'R\ 910-1101, 'L' ,1103-1132, 'L', 1134-1221, 

'L', 1223-1397 ttlabel KA2 
ttcross-references GB:S76436; NID:g243370; PID:g243371 
REFERENCE A40306 

tauthors Dodge, G.R.; Kovalszky, I.; Chu, M.L.; Hassell, J.R.; 

McBride, O.K.; Yi, H.F.; Iozzo, R.V. 
tjournal Genomics (1991) 10:673-680 
ttitle Heparan sulfate proteoglycan of human colon: partial 

molecular cloning, cellular expression, and mapping of the 
gene (HSPG2) to the short arm of human chromosome 1. 
tcross-references MUID: 91365376 
taccession A40306 
ttmoleculejype mRNA 

ttresidues 1018-1405, 'G', 1407-1409, 'G', 1411-1465 ttlabel DOD 
ttcross-references GB:M64283; NID:gl84424; PID:gl84425 

A33625 

Heremans, A.; van der Schueren, B,; De Cock, B.; Paulsson, 

M.; Cassiman, J. J,; van den Berghe, H. ; David, G. 
J, Cell Biol. (1989) 109:3199-3211 
Matrix-associated heparan sulfate proteoglycan: core 
protein-specific monoclonal antibodies decorate the 
pericellular matrix of connective tissue cells and the 
stromal side of basement membranes, 
tcross-references MUID: 90078352 
taccession B33625 
ttmoleculejype protein 

ttresidues 1379-1384, 'X' , 1386-1388, 'X' , 1390-1398 ttlabel HE2 
taccession A33625 
ttmoleculejype protein 

ttresidues 2166-2171, 'X', 2173-2175, 'X', 2177-2185 ttlabel HE3 
ttnote peptide potentially matches four different regions of 

sequence shown 

GENETICS 

tgene GDB:HSPG2 

ttcross-references GDB:126372; OMIM:142461 
tmapj>osition lp36,Mp36,l 
CLASSIFICATION tsuperfamily LDL receptor ligand-binding repeat homology; 

laminin G repeat homology; laminin-type EGF-like homology 
KEYWORDS chondroitin sulfate proteoglycan; glycoprotein; heparan 

sulfate; transmembrane protein 

FEATURE 



tauthors 

tjournal 
ttitle 



1-21 


tdomain signal sequence tstatus predicted (label SIG\ 


22-4391 


tproduct perlecan tstatus predicted tlabel mat\ 


22-193 


tdomain I tlabel D0M1\ 


194-530 


tdomain II tlabel DOM2\ 


199-234 


tdomain LDL receptor ligand-binding repeat homology 




tlabel LDL1\ 


285-319 


tdomain LDL receptor ligand-binding repeat homology 




tlabel LDL2\ 


325-359 


tdomain LDL receptor ligand-binding repeat homology 




tlabel LDL3\ 


368-403 


tdomain LDL receptor ligand-binding repeat homology 




tlabel LDL4\ 


531-1676 


tdomain III tlabel D0M3\ 


1563-1610 


tdomain laminin-type EGF-like homology tlabel EG7\ 


1677-3686 


tdomain IV tlabel DOM4\ 


2007-2034 


tdomain transmembrane tstatus predicted tlabel trm\ 


3687-4391 


tdomain V tlabel DOM5\ 


3953-4106 


tdomain laminin G repeat homology tlabel LG2\ 


4149-4151 


tregion motor neuron attachment (L-R-E) motif\ 


4299-4301 


tregion motor neuron attachment (L-R-E) motif\ 


65,71,76 


tbinding_site heparan sulfate (Ser) (covalent) tstatus 




predicted\ 


89,554,1755,2121, 




3072,3105,3279, 




3780,3836,4068 


tbinding_site carbohydrate (Asn) (covalent) tstatus 




predicted\ 


2995,3933,4179 


tbindingjite chondroitin sulfate (Ser) (covalent) 



tstatus predicted 
tlength 4391 tmolecular -weight 468819 tchecksum 7166 
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Query Match 13.5*; Score 118; DB 2; Length 4391; 

Best Local Similarity 45,21; Pred. No. 1.91e-06; 

Matches 14; Conservative 10; Mismatches 4; Indels 3; Gaps 3; 

Db 3853 CQNGGQCHDSESS-SYVCVCPAGFTGSRCEH 3882 

|::| III I: : :| hi :||:| :||: 
Qy 4 CHHG -QCHI SDRGEPY -CLCQPGFSGHHCEQ 32 



Search completed; Fri May 28 09:25:09 1999 
Job time : 18 sees. 
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Release 3.1A John F, Collins, Biocomputing Research Unit, 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 



irchjp protein • protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:25:28 1999; MasPar time 5.14 Seconds 

605.413 Million cell updates/sec 

Tabular output not generated. 

Title: >DS-09-191-647-ll 

Description: (1-110) from US09191647 .pep 

Perfect Score: 873 

Sequence: 1 AFKCHHGQCHISDRGEPYCL GSSFVEEVERHLECGCRACS 110 

Scoring table: PAM 150 
Gap 11 

Searched: 



77977 seqs, 28268293 residues 



Post-processing: Minimum Match 04 

Listing first 45 summaries 

Database: swiss-prot37 
liswissprot 

Statistics: Mean 38.241; Variance 58.972; scale 0.648 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Result 
No. 



4 

Query 



SUMMARIES 



Score 


Match Length D 


ID 


Description 


Pred. No. 


167 


19.1 


1480 


SLIT_DROME 


SLIT PROTEIN PRECURSOR 


7.49e-18 


133 


15.2 


2531 


NTC1J0USE 


NEUROGENIC LOCUS NOTCH 


1.34e-10 


131 


15,0 


2437 


NOTC_BRARE 


NEUROGENIC LOCUS NOTCH 


3.44e-10 


127 


14.5 


375 


CE10_CHICK 


CEF-10 PROTEIN PRECURS 


2.23e-09 


126 


14.4 


379 


CYR6J0USE 


CYR61 PROTEIN PRECURSO 


3.54e-09 


126 


14.4 


5147 


FATJDROME 


CADHERIN-RELATED TUMOR 


3.54e-09 


125 


14.3 


2139 


CRBJDROME 


CRUMBS PROTEIN PRECURS 


5,62e-09 


125 


14,3 


2444 


NTClJUMAN 


NEUROGENIC LOCUS NOTCH 


5.62e-09 


125 


14.3 


2531 


NTC1_RAT 


NEUROGENIC LOCUS NOTCH 


5.62e-09 


122 


14,0 


2703 


NQTC.DROME 


NEUROGENIC LOCUS NOTCH 


2.23e-08 


121 


13.9 


1964 


NTC4_M0CSE 


NEUROGENIC LOCUS NOTCH 


3.52G-08 


121 


13.9 


2524 


NOTCJENLA 


NEUROGENIC LOCUS NOTCH 


3,52e-08 


118 


13.5 


4393 


PGBMJUMAN 


BASEMENT MEMBRANE-SPEC 


1.37e-07 


115 


13.2 


1429 


LI12_CAEEL 


LIN- 12 PROTEIN PRECURS 


5.24G-07 


114 


13.1 


381 


CYR6_HUMAN 


CYR61 PROTEIN PRECURSO 


8,18e-07 


114 


13.1 


2318 


NTC3J0USE 


NEUROGENIC LOCUS NOTCH 


8,18e-07 


113 


12.9 


177 


BTCJOUSE 


BETACELLULIN PRECURSOR 


1.27e-06 


112 


12.8 


723 


DLL1JUMAN 


DELTA-LIKE PROTEIN 1 P 


1.98e-06 


110 


12.6 


612 


LEM2J10USE 


E-SELECTIN PRECURSOR ( 


4.76e-06 


110 


12.6 


1955 


AGRI.CHICK 


AGRIN PRECURSOR. 


4.76G-06 


109 


12.5 


427 


MFGMJOVIN 


MILK FAT GLOBULE -EGF F 


7.35e-06 


109 


12.5 


2871 


FBNlJUMAN 


FIBRILLIN 1 PRECURSOR. 


7.35e-06 


109 


12.5 


2871 


FBNl.MOUSE 


FIBRILLIN 1 PRECURSOR. 


7.35e-06 



108 


12.4 


383 


L DLKJUMAN 


DELTA-LIKE PROTEIN PRE 


1.13e 


05 


108 


12.4 


2415 


PGCAJUMAN 


AGGRECAN CORE PROTEIN 


1.13e 


05 


107 


12.3 


385 


DLKJIOUSE 


DELTA-LIKE PROTEIN PRE 


1.75e 


05 


107 


12.3 


611 


LEM2.CANFA 


E-SELECTIN PRECURSOR ( 


1.75e 


05 


107 


12.3 


1295 


GLP1_CAEEL 


GLP-1 PROTEIN PRECURSO 


l,75e 


05 


106 


12.1 


84 


HBGF.PIG 


HEPARIN-BINDING EGF-LI 


2.68e 


05 


106 


12.1 


121 


TGFA_MACMU 


TRANSFORMING GROWTH FA 


2,68e 


05 


106 


12.1 


159 


TGFAJAT 


TRANSFORMING GROWTH FA 


2,68e 


05 


106 


12.1 


160 


TGFA.HUMAN 


TRANSFORMING GROWTH FA 


2.68e 


05 


106 


12.1 


160 


TGFA.PIG 


TRANSFORMING GROWTH FA 


2.68e 


05 


106 


12.1 


178 


BTCJUMAN 


BETACELLULIN PRECURSOR 


2.68e 


05 


106 


12.1 


208 


HBGFJUMAN 


HEPARIN-BINDING EGF-LI 


2.68e 


05 


106 


12.1 


208 


HBGF_CERAE 


HEPARIN-BINDING EGF-LI 


2.68e 


05 


106 


12.1 


2871 


FBN1JOVIN 


FIBRILLIN 1 PRECURSOR 


2.68e 


05 


105 


12.0 


159 


TGFAJOUSE 


TRANSFORMING GROWTH FA 


4, lie 


05 


105 


12.0 


3707 


PGBMJOUSE 


BASEMENT MEMBRANE -SPEC 


4, lie 


05 


105 


12.0 


4543 


LRPl.CHICK 


LOW- DENSITY LIPOPROTEI 


4. lie 


05 


104 


11.9 


50 


TGFA.RABIT 


TRANSFORMING GROWTH FA 


6.30e 


05 


103 


11.8 


133 


TGFA_SHEEP 


TRANSFORMING GROWTH FA 


9.62e 


05 


103 


11,8 


140 


GRFAJACCV 


GROWTH FACTOR. 


9.62e 


05 


103 


11.8 


969 


PAC4JUMAN 


SUBTILISIN-LIKE PROTEA 


9.62e 


05 


103 


11.8 


2813 


VWFJUMAN 


VON WILLEBRAND FACTOR 


9.62e 


05 



ALIGNMENTS 



STANDARD; 



PRT; 1480 AA. 



RESULT 1 
ID SLITJROME 
AC P24Q14; 
DT 01-MAR-1992 (REL. 21, CREATED) 
DT 01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 
DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 
DE SLIT PROTEIN PRECURSOR. 
GN SLI. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
OC DROSOPHILIDAE; DROSOPHILA. 
RN [1] 

SEQUENCE FROM N.A. 
MEDLINE; 91099665. 

ROMBERG J.M., JACOBS J.R., GOODMAN C.S., AFTAVANIS-TSAKONAS S.; 
"Slit: an extracellular protein necessary for development of midline 
glia and commissural axon pathways contains both EGF and LRR 
domains."; 

GENES DEV. 4:2169-2187(1990). 

■!- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

COMMISSURAL AXON PATHWAYS, SLIT MAY INTERACT WITH EXTRACELLULAR 
MATRIX MOLECULES. 

TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 
EVENTUALLY DISTRIBUTED ALONG THE AXONS. 

ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 
BY 11 AA AT THE C-TERMINUS OF THE LAST EGF REPEAT. 
SIMILARITY; CONTAINS 7 EGF -LIKE DOMAINS, 
SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS, NUMBER IN THIS PROTEIN: 22, TWO BLOCK OF 6 LRR'S 
AND TWO BLOCKS OF 5 LRR'S. 

SIMILARITY: CONTAINS A C-TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 



-!- 



■I- 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation * 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch). 

EMBL; X53959; G8615; -. 
PIR; A36665; A36665. 
FLYBASE; FBgn0003425; sli. 
PROSITE; PS00010; ASXJYDROXYL; 3, 
PROSITE; PS00022; EGF_1; 7. 
PROSITE; PS01185; CTCK_1; 1. 
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US-09-191-647-ll.rsp 
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DR PROSITE; PS01186; EGF.2; 5, 

DR PROSITE; PS011B7; EGF_CA; 2. 

DR PROSITE; PS01225; CTCK 2; 1. 

DR PFAM; PF00007; Cysjnot; 1. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PF00054; lamininJJ; 1. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUCINE- REPEAT; DUPLICATION, 



FT 






36 












SLIT PROTEIN. 


FT 


DOMAIN 


70 


104 


PfiNOrDOTn m-pt mmtmp dwtaxi ap tup too 


FT 


DOMAIN 


105 


230 


T PrTPTMP-DTfU DPDPZLTC MOT DP^TAM\ 


FT 




231 




PAMCPDVPn C.Pr itttfTMf DPPTAM AP TUP TOD 


FT 


DOMAIN 


295 


326 


fflNCFDVPn M-PT BMJCTMA DFPTAH AP THP TDD 


FT 




327 


452 


T FtTPTMP-DTPU DPDPRTC /1MTt DP^TAKM 

LHULint, Kllti KtrMUo (IW KLblUN) . 






453 


518 


fnHCPDVPn r-PrSHtrTMP DPCTAH ap tup TDD 


FT 


DOMAIN 


519 


550 


fOMCPDVPn M-PI iOTTMr' DFCTAM AP TUP I DD 


FT 


DOMAIN 


551 


653 


LEUCINE-RICH REPEATS (3RD REGION), 


FT 


DOMAIN 


654 


714 


CONSERVED C* FLANKING REGION OF THE LRR. 


m 


DOMAIN 


715 


746 


CONSERVED N-FLANKING REGION OF THE LRR. 


P 


DOMAIN 


747 


848 


LEUCINE-RICH REPEATS (4TH REGION), 




DOMAIN 


849 


910 




FT 


REPEAT 


105 


115 


LRR 1-1, 


FT 


REPEAT 


116 


139 


LRR 1-2. 




REPEAT 


140 


163 


LRR 1-3. 


FT 


REPEAT 


164 


187 


LRR 1-4 , 


FT 


REPEAT 


188 


211 


LRR 1-5, 


FT 


REPEAT 


212 


230 


LRR 1-6. 




REPEAT 


327 


337 




FT 


REPEAT 


338 


361 


LRR 2-2. 


FT 


REPEAT 


362 


385 




FT 


REPEAT 


386 


409 


LRR 2-4* 


FT 


REPEAT 


410 


433 


LRR 2-5. 


FT 


REPEAT 


434 


452 


• LRR 2*6, 


FT 


REPEAT 


551 


562 


LRR 3-1. 


FT 


REPEAT 


563 


586 


LRR 3-2. 


FT 


REPEAT 


587 


610 




FT 


REPEAT 


611 


634 


tdr w' 


FT 


REPEAT 


635 


653 


LRR 3-5' 


FT 


REPEAT 


747 


757 


LRR 4-l' 


FT 


REPEAT 


758 


781 


LRR 4-2, 


FT 


REPEAT 


782 


805 


LRR 4*3, 


FT 


RPPFAT 


806 


829 




FT 


REPEAT 


830 


848 


LRR 4 I' 


FT 


DOMAIN 


907 


944 




FT 


DOMAIN 


946 


983 




FT 


DOMAIN 


985 


1022 


FGF-TJKF 1 fAirTtTM-RTNnTNr, fPflTPMTTAn 


i 


DOMAIN 


1024 


1062 






DOMAIN 


1064 


1100 




P 


DOMAIN 


1111 


1149 




FT 


DOMAIN 


/1353 


1392 




FT 


DOMAIN 


1409 


1480 


CTCK, 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM) , 


FT 


CARBOHYD 


111 


111 


POTENTIAL. 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


FT 


CARBOHYD 


435 


435 


POTENTIAL, 


FT 


CARBOHYD 


783 


783 


POTENTIAL, 


FT 


CARBOHYD 


788 


788 


POTENTIAL, 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


FT 


CARBOHYD 


998 


998 


POTENTIAL, 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL. 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL, 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL. 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISULFID 


916 


932 


BY SIMILARITY, 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DISULFID 


950 


961 


BY SIMILARITY. 


FT 


DISULFID 


955 


971 


BY SIMILARITY. 



FT 


DISULFID 


973 


982 


BY SIMILARITY. 


FT 


DISULFID 


989 


1001 


BY SIMILARITY. 


FT 


DISULFID 


995 


1010 


BY SIMILARITY. 


FT 


DISULFID 


1012 


1021 


BY SIMILARITY. 


FT 


DISULFID 


1028 


1041 


BY SIMILARITY. 


FT 


DISULFID 


1035 


1050 


BY SIMILARITY. 


FT 


DISULFID 


1052 


1061 


BY SIMILARITY. 


FT 


DISULFID 


1068 


1079 


BY SIMILARITY, 


FT 


DISULFID 


1073 


1088 


BY SIMILARITY, 


FT 


DISULFID 


1090 


1099 


BY SIMILARITY. 


FT 


DISULFID 


1115 


1125 


BY SIMILARITY. 


FT 


DISULFID 


1120 


1137 


BY SIMILARITY. 




mcrTT ptpi 
LUoULMU 


1139 


1148 


BY SIMILARITY. 


FT 


DISULFID 


1357 


1368 


BY SIMILARITY. 


FT 


DISULFID 


1362 


1380 


BY SIMILARITY. 


FT 


DISULFID 


1382 


1391 


BY SIMILARITY. 


FT 


DISULFID 


1409 


1443 


BY SIMILARITY. 


FT 


DISULFID 


1423 


1457 


BY SIMILARITY. 


FT 


DISULFID 


1434 


1473 


BY SIMILARITY. 


FT 


DISULFID 


1438 


1475 


BY SIMILARITY. 


FT 


DISULFID 


1442 


1479 


BY SIMILARITY. 


SO 


SEQUENCE 


1480 AA; 165752 MW; 2CD1C421 C 



Query Match 19.1%; Score 167; DB 1; Length 1480; 

Best Local Similarity 29.4%; Pred. No. 7.49e-18; 

Matches 35; Conservative 25; Mismatches 47; Indels 12; Gaps 12; 

Db 1361 KCRRGSRCVPNSNARDGYQCKCKHGQRGRYCDQGEGSTEPPTVTAASTCRKEQVREYYTE 1420 

I::! :| I: : II I I I: 1:1 I : II |: I ::| : 
Qy 3 KCHHG - QC - H I SDRGEPY - CLCQPG FSGHHCEQ - ENPCMGEI VREA - 1 - RR - Q - KDY - AS 53 

Db 1421 NDCRSRQPLKYAKCVGGCGNQCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRKCGCTKKC 1479 

I: I:: I Nil II : :||| |:: :: ::: III : |< 
Qy 54 CATASKVPIM-E-CRGGCGTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGC-RAC 109 



RESULT 2 

ID NTClJOUSE STANDARD; PRT; 2531 AA. 

AC Q01705; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR (MOTCH PROTEIN) , 

GN NOTCH1 OR MOTCH. 

OS MUS MUSCULOS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93194170. 

RA FRANCO DEL AMO P., GENDRON-MAGUIRE M., SWIATEK P.J., JENKINS N.A., 

RA COPELAND.N.G., GRIDLEY T.; 

RT "Cloning, analysis, and chromosomal localization of Notch- 1, a mouse 

RT homolog of Drosophila Notch."; 

RL GENOMICS 15:259-264(1993) . 

RN [2] 

RP SEQUENCE OF 1551-2170 FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93048835, 

RA FRANCO DEL AMO F., SMITH D.E, , SWIATEK P.J., GENDRON-MAGUIRE M,, 

RA GREENSPAN R.J. , MCMAHON A, P., GRIDLEY T . ; 

RT "Expression pattern of Motch, a mouse homolog of Drosophila Notch, 

RT • suggests an important role in early postimplantation mouse 

RT development."; 

RL DEVELOPMENT 115:737-744(1992). 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!■ DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS, 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC 
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CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 
CC the European Bioinformatics Institute. There are no restrictions on its 
CC use by non-profit institutions as long as its content is in no way 
CC modified and this statement is not removed. Usage by and for commercial 
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; Z11886; G288503; -. 
DR MGD; MGI : 97363 ; N0TCH1. 
DR PROSITE; PS00010; ASXJYDROXYL; 22. 
DR PROSITE; PS00022; EGF 1; 34. 
DR PROSITE; PS01186; EGF_2 ; 27. 
DR PROSITE; PS01187; EGF_CA; 21. 
DR PFAM; PF00008; EGF; 35. 
PFAM; PF00023; ank; 6. 
PFAM; PF00066; notch; 3, 
HSSP; P00740; 1IXA. 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



DR 

t 



FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH H0M0LOG PROTEIN 1, 


FT 


DOMAIN 


19 


1725 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


1726 


1746 


POTENTIAL. 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


24 


1425 


36 X EGF-TYPE REPEATS , 


FT 


DOMAIN 


1449 


1462 


CYS-RICH. 


FT 


DOMAIN 


1445 


1562 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1445 


1480 


LIN/NOTCH 1. 


FT 


REPEAT 


1481 


1522 


LIN/NOTCH 2, 


FT 


REPEAT 


1523 


1562 


LIN/NOTCH 3. 


FT 


DOMAIN 


1865 


2075 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1865 


1910 


ANK MOTIF 1. 


FT 


REPEAT 


1912 


1942 


ANK MOTIF 2. 


FT 


REPEAT 


1944 


1975 


ANK MOTIF 3, 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4. 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


FT 


REPEAT 


2044 


2075 


ANK MOTIF 6. 


FT 


CARBOHYD 


888 


888 


POTENTIAL. 


FT 


CARBOHYD 


959 


959 


POTENTIAL. 


FT 


CARBOHYD 


1179 


1179 


POTENTIAL. 


FT 


CARBOHYD 


1241 


1241 


POTENTIAL, 


FT 


CARBOHYD 


1489 


1489 


POTENTIAL. 


FT 


CARBOHYD 


1587 


1587 


POTENTIAL, 


1 


SEQUENCE 


2531 AA; 271312 MW; AD71189B CRC32; 




uery Match 




15.24; 


Score 133; DB 1; Length 2531; 



Best Local Similarity 25.4*; Pred. No. 1.34e-10; 
Matches 29; Conservative 29; Mismatches 47; Indels 9; Gaps 9; 

Db 575 CHYGSCK-DGVATFTCLCQPGYTGHHCETNINECHSQPCRHGGTCQDRDNSYLCLCLKGT 633 

II I I : llllll::|llll : I I :: I : : |:: I I :: 
Qy 4 CHHGQCHISDRGEPYCLCQPGFSGHHCEQE-NPCMGEIVREA-I-R-RQKDYAS-CATAS 58 

Db 634 TGPNCEINLDDCASNPCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPC 687 

I I I:: I:: : ::|: ::: : : :: ||: :| 
Qy 59 KVP IMECR - GGCGTTCCQP I RSKRRKYVFQCTDGS SFVEEV - ERHL- ECGCRAC 109 



RESULT 3 

ID NOTCJRARE STANDARD; PRT; 2437 AA. 

AC P46530; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN PRECURSOR. 

GN NOTCH. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM N. A. 



RC TISSUE-EMBRYO; 

RX MEDLINE; 94128602, 

RA BIERKAMP C . , CAMPOS -ORTEGA J. A.; 

RT "A zebrafish homologue of the Drosophila neurogenic gene Notch and 

RT its pattern of transcription during early embryogenesis."; 

RL MECH. DEV. 43:87-100(1993), 

CC -!■ FUNCTION; IMPLICATED IN CELL FATE SPECIFICATIONS DURING 
CC EMBRYO DEVELOPMENT. MAY BE INVOLVED IN THE FORMATION OF THE 
CC NEURAL PLATE, NOTOCHORD AND BRAIN VESICLES. 

CC -!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 

CC ■!- DEVELOPMENTAL STAGE: EXPRESSED IN ALL CELLS IN PREGASTRULATION 
CC STAGES. DURING GASTRULATION IS DIFFERENTIALLY EXPRESSED, 
CC ACCUMULATING PREDOMINANTLY IN THE PRECHORDAL MESODERM AND 
CC NOTOCHORD. AT THE END OF GASTRULATION, EXPRESSED ALONG THE 
CC ANTERIOR-POSTERIOR AXIS INCLUDING THE DEVELOPING NEURAL PLATE 
CC AND DIFFERENTIATING MESODERM. ALSO PRESENT IN THE DEVELOPING 
CC BRAIN AND HEAD REGIONS. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC. -!• SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!■ SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch), 

cc 

DR EMBL; X69088; G433867; ■. 

DR PROSITE; PS00010; ASXJYDROXYL; 23. 

DR PROSITE; PS00022; EGF_1; 34. 

DR PROSITE; PS01186; EGF 2; 28. 

DR PROSITE; PS01187; EGF CA; 22. 

DR PFAM; PF00008; EGF; 36, 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 

POTENTIAL. 

NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN. 
EXTRACELLULAR (POTENTIAL). 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL). 
EGF-LIKE 1. 
EGF-LIKE 2. 
EGF-LIKE 3, 
EGF-LIKE 4. 

EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 6, 

EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 
EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 10. 



FT 


SIGNAL 


1 


20 


FT 


CHAIN 


21 


2437 


FT 


DOMAIN 


21 


1724 


FT 


TRANSMEM 


1725 


1747 


FT 


DOMAIN 


1748 


2437 


FT 


DOMAIN 


21 


57 


FT 


DOMAIN 


58 


98 


FT 


DOMAIN 


101 


138 


FT 


DOMAIN 


139 


175 


FT 


DOMAIN 


177 


215 


FT 


DOMAIN 


217 


254 


FT 


DOMAIN 


256 


292 


FT 


DOMAIN 


294 


332 


FT 


DOMAIN 


334 


370 


FT 


DOMAIN 


371 


409 


FT 


DOMAIN 


411 


449 


FT 


DOMAIN 


451 


487 


FT 


DOMAIN 


489 


524 


FT 


DOMAIN 


526 


562 


FT 


DOMAIN 


564 


599 


FT 


DOMAIN 


601 


637 


FT 


DOMAIN 


639 


674 


FT 


DOMAIN 


676 


712 


FT 


DOMAIN 


714 


749 


FT 


DOMAIN 


751 


787 


FT 


DOMAIN 


789 


825 


FT 


DOMAIN 


827 


865 


FT 


DOMAIN 


867 


903 


FT 


DOMAIN 


905 


941 


FT 


DOMAIN 


943 


979 


FT 


DOMAIN 


981 


1017 


FT 


DOMAIN 


1019 


1055 



EGF-LIKE 11, CALCIUM-B] 

EGF-LIKE 12, CALCIUM-B] 

EGF-LIKE 13, CALCIUM-B] 

EGF-LIKE 14, CALCIUM-B] 

EGF-LIKE 15, CALCIUM-B] 

EGF-LIKE 16, CALCIUM-B] 

EGF-LIKE 17, CALCIUM-B] 

EGF-LIKE 18, CALCIUM-B] 

EGF-LIKE 19, CALCIUM-B] 

EGF-LIKE 20, CALCIUM-B] 

EGF-LIKE 21, CALCIUM-B] 
EGF-LIKE 22. 

EGF-LIKE 23, CALCIUM-B] 

EGF-LIKE 24, CALCIUM-B] 

EGF-LIKE 25, CALCIUM-B] 
EGF-LIKE 26. 

EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 



^-BINDING (POTENTIAL). 

^-BINDING (POTENTIAL). 

^-BINDING (POTENTIAL) . 

^-BINDING (POTENTIAL). 

^-BINDING (POTENTIAL). 

^-BINDING (POTENTIAL), 

WINDING (POTENTIAL), 

^-BINDING (POTENTIAL). 

WINDING (POTENTIAL). 

WINDING (POTENTIAL). 

WINDING (POTENTIAL) , 

WINDING (POTENTIAL). 

WINDING (POTENTIAL). 

WINDING (POTENTIAL). 
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FT 


DOMAIN 


1057 


1093 


EGF-LIKE 28, 


FT 


DOMAIN 


1095 


1141 


EGF-LIKE 29. 


FT 


DOMAIN 


1143 


1179 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1181 


1217 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1219 


1263 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1265 


1303 


EGF-LIKE 33, 


FT 


DOMAIN 


1305 


1344 


EGF-LIKE 34, 


FT 


DOMAIN 


1346 


1382 


EGF-LIKE 35. 


FT 


DOMAIN 


1385 


1423 


EGF-LIKE 36, 


FT 


DOMAIN 


1446 


1561 


3 X LIN/NOTCH REPEATS, 


FT 


REPEAT 


1446 


1486 


LIN/NOTCH 1. 


FT 


REPEAT 


1487 


1520 


LIN/NOTCH 2. 


FT 


REPEAT 


1521 


1561 


LIN/NOTCH 3. 


FT 


DOMAIN 


1861 


2074 


6 X ANK MOTIF REPEATS, 


FT 


REPEAT 


1861 


1891 


ANK MOTIF 1. 


FT 


REPEAT 


1892 


1940 


ANK MOTIF 1. 


FT 


REPEAT 


1941 


1974 


ANK MOTIF 1, 


FT 


REPEAT 


1975 


2007 


ANK MOTIF 1. 


FT 


REPEAT 


2008 


2040 


ANK MOTIF 1. 


FT 


REPEAT 


2041 


2074 


ANK MOTIF 1, 


FT 


DOMAIN 


2265 


2276 


POLY'GLN (OPA'REPEAT) , 


■ 


DISULFID 


25 


35 


BY SIMILARITY. 




DISULFID 


29 


45 


BY SIMILARITY, 


If 


DISULFID 


47 


56 


BY SIMILARITY. 


FT 


DISULFID 


62 


73 


BY SIMILARITY, 


FT 


DISULFID 


67 


86 


BY SIMILARITY. 


FT 


DISULFID 


88 


97 


BY SIMILARITY. 


FT 


DISULFID 


105 


116 


RY CTVTT ftRTTY 


FT 


DISULFID 


110 


126 


RV CTVTT.IIJTTV 


FT 


DISULFID 


128 


137 


RY C.TVTT.fiRTTV 
DI oi-niljnlull . 


FT 


DISULFID 


143 


154 


BY SIMILARITY. 


FT 


DISULFID 


148 


163 


RY CTVTT.&RTTY 


FT 


DISULFID 


165 


174 


BY SIMILARITY, 


FT 


DISULFID 


181 


194 


RV CTVTT 1PTTV 


FT 


DISULFID 


188 


203 


RV CTVTT BDTTV 

oi ainiijHKHi, 


FT 


DISULFID 


205 


214 


RV CTVTT JtRTTV 


FT 


DISULFID 


221 


232 


RV QTMTT1RTTY 


FT 


DISULFID 


226 


242 


RY CTVTT ARTTV 
OI OiMlLftlUll, 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT 


DISULFID 


260 


271 


BY SIMILARITY. 


FT 


DISULFID 


265 


280 


BY SIMILARITY, 


FT 


DISULFID 


282 


291 


BY SIMILARITY. 


FT 


DISULFID 


298 


311 


BY SIMILARITY. 


FT 


DISULFID 


305 


320 


BY SIMILARITY. 


FT 


DISULFID 


322 


331 


BY SIMILARITY. 


FT 


DISULFID 


338 


349 


BY SIMILARITY. 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 


FT 


DISULFID 


360 


369 


BY SIMILARITY. 


FT 


DISULFID 


375 


386 


BY SIMILARITY, 


Jk 


DISULFID 


380 


397 


BY SIMILARITY. 


ft 


DISULFID 


399 


408 


RY CTVTT.&RTTY 
OI DIPllljnKll I . 


P 


DISULFID 


415 


428 


RY CTVTI 1RTTV 


FT 


DISULFID 


422 


437 


RY CTVTT URTTV 
01 DlftlLnnll I . 


FT 


DISULFID 


439 


448 


RV CTVTT 1DTTV 
01 SlnlbHKlII. 


FT 


DISULFID 


455 


466 


RY QTVTr.ATJTTY 
01 DlFUljnflll I . 


FT 


DISULFID 


460 


475 


BY CTVTT RDTTV 
OI OlniljftKl.il. 




nTcm pin 

UlOvLf iu 






BY SIMILARITY. 


FT 


DISULFID 


493 


503 


RY CTMTLaRTTY 
01 dlFUljAKll I . 


FT 


DISULFID 


498 


512 


RV CTMTT.ftBTTY 
01 DiniJjnlxll I . 


FT 


DISULFID 


514 


523 


RV CTMTTfiDTTV 


FT 


DISULFID 


530 


541 


BY SIMILARITY. 


FT 


DISULFID 


535 


550 


BY SIMILARITY. 


FT 


DISULFID 


552 


561 


RY SJVTLARTTY 
DI DXFlXljnKll 1 . 


FT 


DISULFID 


568 


578 


BY SIMILARITY. 


FT 


DISULFID 


573 


587 


BY SIMILARITY. 


FT 


DISULFID 


589 


598 


BY SIMILARITY. 


FT 


DISULFID 


605 


616 


BY SIMILARITY. 


FT 


DISULFID 


610 


625 


BY SIMILARITY, 


FT 


DISULFID 


627 


636 


BY SIMILARITY. 


FT 


DISULFID 


643 


653 


BY SIMILARITY. 


FT 


DISULFID 


648 


662 


BY SIMILARITY, 


FT 


DISULFID 


664 


673 


BY SIMILARITY. 


FT 


DISULFID 


680 


691 


BY SIMILARITY, 



J 


DISULFID 


685 


700 


BY SIMILARITY. 


FT 


HTCTTT PTH 


702 


711 


BY SIMILARITY. 




riTcrtT t?Tr\ 
LUbULt ID 


718 


728 


BY SIMILARITY. 


FT 

J 


nTcm PTn 
DlbULt ID 


723 


737 


BY SIMILARITY, 




nTcnT CTn 




it! 


BY SIMILARITY, 




riTCnr PTr\ 
UlbULrlD 




766 


BY SIMILARITY. 




riTcm FTn 


760 




BY SIMILARITY, 


FT 


nTcriTFTn 


in 


786 


OV CTWTT JDTIPV 

oi alMlLAKlli . 




T1T5FTT FTn 


793 


804 


DV CTWTT iDTfV 

di alMILAluIi. 


FT 


riTcniFTri 
m&ULr iu 






BY SIMILARITY. 


FT 


DISULFID 


815 


824 


RV CTVTT BDTTV 
01 DlMlLflKllI . 






831 




DV CTWTT iDTTV 

oi DlMiLAKi.il. 


FT 


UlSULf ID 


836 


B« 


DV CTWTT RDTIPV 

BY bIMILARITl, 


FT 


nTSftt PTn 




flfii 


DV CTWTT TlDTTiV 

oi alMlLAKlli, 




nTcrurTn 
UloUbt 1L) 


871 


001 

882 


BY SIMILARITY. 


FT 


nTCtti.FTn 


876 




DV CTWTT RDTTV 


FT 


rucm PTn 
UlaULr ID 


893 


902 


BY SIMILARITY, 


FT 


DISULFID 


909 


920 


BY SIMILARITY. 




DISULFID 


914 


929 


BY SIMILARITY. 


FT 


DlbULf IU 


931 


940 


BY SIMILARITY. 


FT 


DISULFID 


947 


958 


BY SIMILARITY. 


FT 


DISULFID 


952 


967 


BY SIMILARITY. 


FT 


DISULFID 


969 


978 


BY SIMILARITY. 


FT 


DISULFID 


1023 


1034 


BY SIMILARITY. 


FT 


DISULFID 


1028 


1043 


BY SIMILARITY. 


FT 


DISULFID 


1045 


1054 


BY SIMILARITY. 


FT' 


DISULFID 


1061 


1072 


BY SIMILARITY. 


FT 


DISULFID 


1066 


1081 


BY SIMILARITY. 


FT 


DISULFID 


1083 


1092 


BY SIMILARITY. 


FT 


DISULFID 


1099 


1120 


BY SIMILARITY, 


FT 


DISULFID 


1114 


1129 


BY SIMILARITY. 


FT 


riTcm FTn 


1131 


1140 


DV CTWTT RDTTV 

Di alMlLMlil . 


FT 


DISULFID 


1147 


1158 


BY SIMILARITY. 


FT 


DISULFID 


1152 


1167 


BY SIMILARITY. 


FT 


DISULFID 


1169 


1178 


BY SIMILARITY. 


FT 


DISULFID 


1185 


1196 


BY SIMILARITY. 


FT 


DISULFID 


1190 


1205 


BY SIMILARITY. 


FT 


DISULFID 


1207 


1216 


BY SIMILARITY. 


FT 


DISULFID 


1223 


1242 


BY SIMILARITY. 


FT 


DISULFID 


1236 


1251 


BY SIMILARITY. 


FT 


DISULFID 


1253 


1262 


BY SIMILARITY. 



Note: remainder of annotations omitted. 

Query Match 15 .0%; Score 131; DB 1; Length 2437; 

Best Local Similarity 41.0%; Pred. No. 3.44e-10; 

Matches 16; Conservative 12; Mismatches 9; Indels 2; Gaps 2; 

Db 1352 SLRCRNGATCVSGHLSPRCLCAPGFSGHECQTRMDSPCL 1390 

:::!::! :| : I III Mill |: : ;:||: 
Qy 1 AFKCHHGQCHISDRGEPYCLCQPGFSGHHCE-Q-ENPCM 37 



RESULT 4 

ID CE10.CHICK STANDARD; PRT; 375 AA. 

AC P19336; 

DT 01-NOV-1990 (REL. 16, CREATED) 

DT 01-NOV-1990 (REL. 16, LAST SEQUENCE UPDATE) 

DT 01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 

DE CEF-10 PROTEIN PRECURSOR. 

OS GALLUS GALLUS (CHICKEN) . 

0C EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N.A. 
RX . MEDLINE; 89145206. 

RA SIMMONS D.L., LEVY D.B., YANNONI Y., ERIKSON R.L.; 

RT "Identification of a phorbol ester -repressible v-src-inducible gene/; 

RL PROC. NATL. ACAD. SCI. U.S.A. 86:1178-1182(1989). 

CC ■!• FUNCTION: PROBABLE SECRETED REGULATORY PROTEIN, 

CC -I- INDUCTION: BY V-SRC. 

CC -I- SIMILARITY: BELONGS TO THE INSULIN-LIKE GROWTH FACTOR BINDING 
CC PROTEIN FAMILY. CEF-10/CYR61/CTFG/FISP-12/NOV PROTEIN SUBFAMILY. 
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This SWISS -PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Osage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch), 



CC -!- SIMILARITY: CONTAINS 1 VWFC DOMAIN. 

CC •!- SIMILARITY: CONTAINS 1 C-TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

DR EMBL; J04496; G211436; -. 
DR PIR; A41428; A41428. 
DR PROSITE; PS00222; IGFJINDING; 1. 
DR PROSITE; PS01185; CTCK.l; 1. 
DR PROSITE; PS01225; CTCK 2; 1, 

•PROSITE; PS01208; VWFC; 1. 
PFAM; PF00007; Cysjtnot; 1. 
PFAM; PF00090; tsp_l; 1. 
DR PFAM; PF00093; VWC; 1. 
DR PFAM; PF00219; IGFBP; 1. 
KW GROWTH FACTOR BINDING; SIGNAL. 



FT 


SIGNAL 


1 


22 




FT 


CHAIN 


23 


375 


CEF-10 PROTEIN. 


FT 


DOMAIN 


98 


164 


VWFC. 


FT 


DOMAIN 


281 


355 


CTCK. 


FT 


DISULFID 


281 


318 


BY SIMILARITY. 


FT 


DISULFID 


298 


332 


BY SIMILARITY. 


FT 


DISULFID 


309 


348 


BY SIMILARITY. 


FT 


DISULFID 


312 


350 


BY SIMILARITY. 


FT 


DISULFID 


317 


354 


BY SIMILARITY. 


SQ 


SEQUENCE 


375 AA; 


40651 MW; 68B4BC92 CRC32 


Query Match 




14.51; 


Score 127; DB 1; 



Best Local Similarity 32,3%; Pred. No. 2,23e-09; 
Matches 20; Conservative 11; Mismatches 28; Indels 3; Gaps 3 

Db 295 YAGCSSVKKYRPKYC-GSCVDGRCCTPQQTRTVKIRFRCDDGETFTKSVMMIQSCRCNYN 353 

1:1:: I : I |:| III ::: I |:| II :| | | | 
Qy 51 YASCATASKVPIMECRGGC-GTTCCOPIRSKRRKYVFQCTDGSSFVEEVERHLECGCR-A 108 

Db 354 CP 355 
I: 

Qy 109 CS 110 



BULT 



5 



ID CYR6JOUSE STANDARD; PRT; 379 AA, 

AC P18406; 

DT 01-NOV-1990 (REL. 16, CREATED) 

DT 01-NOV-1990 (REL. 16, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CYR61 PROTEIN PRECURSOR (3CH61). 

GN IGFBP10 OR CYR61. 

OS MDS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BALB/C; TISSUE-FIBROBLAST; 

RX MEDLINE; 90287146. 

RA O'BRIEN T.P., YANG G.P., SANDERS L. , LAU L.F.; 

RT "Expression of cyr61, a growth factor-inducible immediate-early 

RT gene."; 

RL MOL. CELL. BIOL. 10:3569-3577(1990). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-AJ; TISSUE=EMBRYONIC FIBROBLAST; 

RX MEDLINE; 91288203. 

RA LATINKIC B.V., O'BRIEN T.P., LAU L.F.; 

RT "Promoter function and structure of the growth factor-inducible 

RT immediate early gene cyr61."; 

RL NUCLEIC ACIDS RES. 19:3261-3267(1991). 



-!- 



FUNCTION: MAY ACT AS ONE OF THE MANY GROWTH FACTOR-BINDING 
PROTEINS; PROMOTES PROLIFERATION, MIGRATION AND ADHESION. 
TISSUE SPECIFICITY: LOW IN KIDNEY, ADRENAL GLAND, TESTES, BRAIN, 
AND OVARY, MODERATE IN HEART, UTERUS, AND SKELETAL MUSCLE, HIGHEST 
IN LUNG. 

DEVELOPMENTAL STAGE: EXPRESSED FROM G(0)/G(1) THROUGH MID-G(l) IN 
NORMAL CELLS, AND AT A CONSTANT LEVEL IN RAPIDLY GROWING CELLS. 
INDUCTION: BY GROWTH FACTORS. 

SIMILARITY: BELONGS TO THE INSULIN-LIKE GROWTH FACTOR BINDING 
PROTEIN FAMILY. CEF-10/CYR61/CTFG/FISP-12/NOV PROTEIN SUBFAMILY, 
SIMILARITY: CONTAINS 1 VWFC DOMAIN. 

SIMILARITY: CONTAINS 1 C'TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://ww.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 

EMBL; M32490; G309206; -. 
EMBL; X56790; G50633; -, 
PIR; A35669; A35669. 
MGD; MGI: 88613; IGFBP10. 
PROSITE; PS00222; IGFJINDING; 1. 
PROSITE; PS01185; CTCK.1; 1. 
PROSITE; PS01225; CTCK 2; 1. 
PROSITE; PS01208; VWFC; 1, 
PFAM; PF00007; Cysjnot; 1. 
PFAM; PF00090; tsp_l; 1. 
PFAM; PF00093; vwc; 1. 
PFAM; PF00219; IGFBP; 1. 
GROWTH FACTOR BINDING; SIGNAL. 
SIGNAL 
CHAIN 
DOMAIN 
DOMAIN 
DISULFID 
DISULFID 
DISULFID 
DISULFID 
DISULFID 



1 


24 


POTENTIAL. 


25 


379 


CYR61 PROTEIN. 


98 


164 


VWFC. 


284 


358 


CTCK. 


284 


321 


BY SIMILARITY. 


301 


335 


BY SIMILARITY. 


312 


351 


BY SIMILARITY. 


315 


353 


BY SIMILARITY. 


320 


357 


BY SIMILARITY. 



SQ SEQUENCE 379 AA; 41709 MW; 116B80C7 CRC32; 

Query Match 14.4%; Score 126; DB 1; Length 379; 

Best Local Similarity 32.3%; Pred. No. 3.54e-09; 

Matches 20; Conservative 12; Mismatches 27; Indels 3; Gaps 3; 

Db 298 YAGCSSVKKYRPKYC-GSCVDGRCCTPLQTRTVKMRFRCEDGEMFSKNVMMIQSCKCNYN 356 

1:1:: I : I 1:1 II |:::: I |:| II I :| I I 
Qy 51 YASCATASKVPIMECRGGC "GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCR - A 108 

Db 357 CP 358 

I: 

Qy 109 CS 110 



RESULT 
ID 
AC 
DT 
DT 
DT 



6 



STANDARD; 



5147 AA. 



FAT.DROME 
P33450; 

01-FEB-1994 (REL. 28, CREATED) 
01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
DE CADHERIN- RELATED TUMOR SUPPRESSOR PRECURSOR (FAT PROTEIN) . 
GN FT. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
OC DROSOPHILIDAE; DROSOPHILA. 
RN [1] 

RP SEQUENCE FROM N.A. 
RX MEDLINE; 92069752, 
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RA MAHONEY P. A., WEBER 0., ONOFRECHUK P., BIESSMANN H,, BRYANT P.J., 

RA GOODMAN C.S.; 

RT "The fat tumor suppressor gene in Drosophila encodes a novel member 

RT of the cadherin gene superfamily. "; 

RL CELL 67:853-868(1991). 

CC ■!■ FUNCTION: COULD FUNCTION AS A CELL-ADHESION PROTEIN. 

CC •!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!• DISEASE: RECESSIVE LETHAL MUTATIONS IN FAT CAUSE HYPERPLASTIC, 

CC TUMOR-LIKE OVERGROWTH OF LARVAL IMAGINAL DISCS, DEFECTS IN 

CC DIFFERENTIATION AND MORPHOGENESIS, AND DEAT DURING THE PUPAL 

CC STAGE. 

CC -!- SIMILARITY: BELONGS TO THE CADHERIN FAMILY. 

CC -!• SIMILARITY; CONTAINS 37 CADHERINS'TYPE REPEATS. 

CC -!• SIMILARITY: CONTAINS 5 EGF-LIKE DOMAINS, 

CC *!- SIMILARITY: CONTAINS 2 LAMININ G'LIKE DOMAINS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

•entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib,ch). 



DR EMBL; M80537; G157409; -. 

DR PIR; A41087; IJFFTM. 

DR FLYBASE; FBgn0001075; ft. 

DR PROSITE; PS00232; CADHERIN; 22. 

DR PROSITE; PS00022; EGFJ; 4. 

DR PROSITE; PS01186; EGF_2; 2. 

DR PFAM; PF00008; EGF; 4. 

DR PFAM; PF00028; cadherin; 33. 

DR PFAM; PFQ0054; laminin.G; 1. 

DR HSSP; P00740; 1IXA. 

KW CELL ADHESION; SIGNAL; TRANSMEMBRANE; CYTOSKELETON; GLYCOPROTEIN; 

KW CALCIUM-BINDING; REPEAT; EGF-LIKE DOMAIN. 



FT 


SIGNAL 


1 


35 


POTENTIAL. 


FT 


CHAIN 


36 


5147 


CADHERIN-RELATED TUMOR SU 


FT 
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Query Match 14 .4%; Score 126; DB 1; Length 5147; 

Best Local Similarity 44.4*; Pred. No. 3.54e-09; 

Matches 16; Conservative 9; Mismatches 8; Indels 3; Gaps 3; 

Db 4061 CRNGGSCORSPDGSSYFCLCRPGFRGNQCESVSDSC 4096 

I: I I :| llhlll |::|| ::| 
Qy 4 CHHG-QCHISDRGEPY-CLCQPGFSGHHCEQ-ENPC 36 
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ID CRB_DROME STANDARD; PRT; 2139 AA. 
AC P10040; 

DT 01-MAR-1989 (EEL, 10, CREATED) 

DT 01-MAY-1991 (REL. 18, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CRUMBS PROTEIN PRECURSOR (95F). 

GN CRB. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
OC DROSOPHILIDAE; DROSOPHILA. 
RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAINOREGON-R; TISSUE=EMBRYO; 

RX MEDLINE; 90263104. 

tTEPASS U., THERES C, KNUST E.; 
"Crumbs encodes an EGF-like protein expressed on apical membranes of 
Drosophila epithelial cells and required for organization of 
RT epithelia."; 
RL CELL 61:787-799(1990). 
• RN [2] 

RP SEQUENCE OF 1663-1955 FROM N.A. 
RX MEDLINE; 87218537. 

RA KNUST E., DIETRICH U., TEPASS U., BREMER K.A., WEIGEL D,, 
RA VAESSIN H., CAMPOS -ORTEGA J. A,; 

RT "EGF homologous sequences encoded in the genome of Drosophila 
melanogaster, and their relation to neurogenic genes,"; 
EMBO J. 6:761-766(1987). 

-!- FUNCTION: MAY PLAY A ROLE IN THE DEVELOPMENT OF EPITHELIA, 
POSSIBLY FOR THE ESTABLISHMENT AND/OR MAINTENANCE OF CELL 
POLARITY. IT MAY ACT AS A SIGNAL. 
-!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 
•!• PTM: PHOSPHORYLATED IN THE CYTOPLASMIC DOMAIN (POTENTIAL). 
-!- SIMILARITY; CONTAINS 29 EGF-LIKE DOMAINS, 

This SWISS -PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute, There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseGisb-sib.ch). 

KEMBL; M33753; G552087; ALT SEQ. 
EMBL; X05144; E1746; -. 
EMBL; X05144; G929536; -. 

DR PIR; B26637; B26637. 

DR PIR; A35672; A35672, 

DR FLYBASE; FBgn0000368; crb, 

DR PROSITE; PS00010; ASXJYDROXYL; 15, 

DR PROSITE; PS00022; EGF_1; 26. 

DR PROSITE; PS01186; EGF_2; 17, 

DR PROSITE; PS01187; EGF CA; 15. 

DR PFAM; PF00008; EGF; 26, 

DR PFAM; PF00054; laminin_G; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN; SIGNAL; PHOSPHORYLATION. 
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Note: remainder of annotations omitted. 

Query Match 14 .3%; Score 125; DB 1; Length 2139; 

Best Local Similarity 42, It; Pred. No. 5.62e-09; 

Matches 16; Conservative 10; Mismatches 9; Indels 3; 

Db 1806 CLNNGTC-INQVAAFFCQCQPGFEGQHCEQNIDECADQ 1842 

I ::l I I:: : :| Hill |:INI: : I : 
Qy 4 C • HHGQCH I SDRGEPYCLCQPGFSGHHCEQE - NPCMGE 39 



RESULT 8 

ID NTCl.HUMAN STANDARD; PRT; 2444 AA. 
AC P46531; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 



DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1 PRECURSOR ( TRANSLOCATION - 

DE ASSOCIATED NOTCH PROTEIN TAN- 1) (FRAGMENT) . 

GN NOTCHl OR TANl . 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 
RN [1] 

RP SEQUENCE FROM N.A. 
RX MEDLINE; 91347367, 

RA ELLISEN L.W., BIRD J., WEST D.C., SORENG A.L., REYNOLDS T.C., 
RA SMITH S.D., SKLAR J.; 

RT "TAN-1, the human homolog of the Drosophila notch gene, is broken by 
RT chromosomal translocations in T lymphoblastic neoplasms , • ; 
RL CELL 66:649-661(1991). 

■I- FUNCTION: MAY BE IMPORTANT FOR NORMAL LYMPHOCYTE FUNCTION. IN 
ALTERED FORM, MAY CONTRIBUTE TO TRANSFORMATION OR PROGRESSION 
IN SOME T-CELL NEOPLASMS. 
•I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
-!- TISSUE SPECIFICITY: IN FETAL TISSUES MOST ABUNDANT IN SPLEEN, 
BRAIN STEM AND LUNG. ALSO PRESENT IN MOST ADULT TISSUES WHERE IT 
. IS FOUND MAINLY IN LYMPHOID TISSUES. 
•!■ SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 
■!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 
-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 
-!- SIMILARITY: CONTAINS 6 ANR REPEATS. 



CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Osage by and for commercial 

CC entities requires a license agreement (See http://vmw.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; M73980; G338675; -. 

DR. MIM; 190198; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 20. 

DR PROSITE; PS00022; EGFJ; 34, 

DR PROSITE; PS01186; EGFJ; 26. 

DR PROSITE; PS01187; EGF_CA; 18, 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


18 


POTENTIAL, 


FT 


CHAIN 


19 


>2444 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1, 


FT 


DOMAIN 


19 


1736 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1737 


1757 


POTENTIAL. 


FT 


DOMAIN 


1758 


>2444 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3. 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10, 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM- BINDING (POTENTIAL) , 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20. 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


829 


868 


EGF-LIKE 22. 
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FT 


DOMAIN 


870 


906 


EG 


"-LIKE 23, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


537 


552 


BY SIMILARITY, 


FT 


DOMAIN 


908 


944 


EG 


"■-LIKE 24. 


FT 


DISULFID 


554 


563 


BY SIMILARITY, 


FT 


DOMAIN 


946 


982 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


570 


580 


BY SIMILARITY, 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 26. 


FT 


DISULFID 


575 


589 


BY SIMILARITY. 


FT 


DOMAIN 


1022 


1058 


EGF-LIKE 27. 


FT 


DISULFID 


591 


600 


BY SIMILARITY, 


FT 


DOMAIN 


1060 


1096 


EGF-LIKE 28. 


FT 


DISULFID 


607 


618 


BY SIMILARITY. 


FT 


DOMAIN 


1098 


1144 


EGF-LIKE 29. 


FT 


DISULFID 


612 


627 


BY SIMILARITY, 


FT 


DOMAIN 


1146 


1182 


EG 


"-LIKE 30. 


FT 


DISULFID 


629 


638 


BY SIMILARITY. 


FT 


DOMAIN 


1184 


1220 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


645 


655 


BY SIMILARITY. 


FT 


DOMAIN 


1222 


1266 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


650 


664 


BY SIMILARITY. 


FT 


DOMAIN 


1268 


1306 


EG 


"•LIKE 33. 


FT 


DISULFID 


666 


675 


BY SIMILARITY. 


FT 


DOMAIN 


1308 


1347 


EGF-LIKE 34. 


FT 


DISULFID 


682 


693 


BY SIMILARITY. 


FT 


DOMAIN 


1349 


1385 


EGF-LIKE 35. 


FT 


DISULFID 


687 


702 


BY SIMILARITY, 


FT 


DOMAIN 


1388 


1427 


EG 


"-LIKE 36. 


FT 


DISULFID 


704 


713 


BY SIMILARITY. 


FT 


DOMAIN 


1446 


1563 


3 X LIN/NOTCH REPEATS. 


FT 


DISULFID 


720 


730 


BY SIMILARITY. 


FT 


REPEAT 


1446 


1481 


LIN/NOTCH 1. 


FT 


DISULFID 


725 


739 


BY SIMILARITY. 


ft 


REPEAT 


1482 


1523 


LIN/NOTCH 2. 


FT 


DISULFID 


741 


750 


BY SIMILARITY. 


■ 


REPEAT 


1524 


1563 


LIN/NOTCH 3. 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


w 


DOMAIN 


1876 


2087 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


762 


777 


BY SIMILARITY, 


FT 


REPEAT 


1876 


1921 


ANK MOTIF 1. 


FT 


DISULFID 


779 


788 


BY SIMILARITY. 


FT 


REPEAT 


1923 


1954 


ANK MOTIF 2. 


FT 


DISULFID 


795 


806 


BY SIMILARITY, 


FT 


REPEAT 


1956 


1987 


ANK MOTIF 3. 


FT 


DISULFID 


800 


815 


BY SIMILARITY, 


FT 


REPEAT 


1990 


2021 


ANK MOTIF 4. 


FT 


DISULFID 


817 


826 


BY SIMILARITY. 


FT 


REPEAT 


2023 


2054 


ANK MOTIF 5. 


FT 


DISULFID 


833 


844 


BY SIMILARITY, 


FT 


REPEAT 


2056 


2087 


ANK MOTIF 6. 


FT 


DISULFID 


838 


855 


BY SIMILARITY, 


FT 


DOMAIN 


1576 


1579 


POLY-VAL. 


FT 


DISULFID 


857 


867 


BY SIMILARITY. 


FT 


DOMAIN 


1662 


1665 


POLY-ARG. 


FT 


DISULFID 


874 


885 


BY SIMILARITY. 


FT 


DOMAIN 


1729 


1732 


POLY -PRO. 


FT 


DISULFID 


879 


894 


BY SIMILARITY, 


FT 


DOMAIN 


1741 


1744 


POLY -ALA. 


FT 


DISULFID 


896 


905 


BY SIMILARITY. 


FT 


DOMAIN 


1902 


1905 


POLY-GLU. 


FT 


DISULFID 


912 


923 


BY SIMILARITY, 


FT 


DOMAIN 


2260 


2263 


POLY-GLY. 


FT 


DISULFID 


917 


932 


BY SIMILARITY, 


FT 


DOMAIN 


2404 


2407 


POLY-GLN, 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DOMAIN 


2411 


2418 


POLY -PRO, 


FT 


DISULFID 


988 


999 


BY SIMILARITY, 


FT 


DISQLFID 


24 


37 


BY 


SIMILARITY. 


FT 


DISULFID 


993 


1008 


BY SIMILARITY. 


FT 


DISULFID 


31 


46 


BY 


SIMILARITY. 


FT 


DISULFID 


1010 


1019 


BY SIMILARITY. 


FT 


DISULFID 


48 


57 


BY 


SIMILARITY. 


FT 


DISULFID 


1026 


1037 


BY SIMILARITY. 


FT 


DISULFID 


63 


74 


BY 


SIMILARITY. 


FT 


DISULFID 


1031 


1046 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY 


SIMILARITY. 


FT 


DISULFID 


1048 


1057 


BY SIMILARITY. 


FT 


DISULFID 


89 


98 


BY 


SIMILARITY. 


FT 


DISULFID 


1064 


1075 


BY SIMILARITY. 


FT 


DISULFID 


106 


117 


BY 


SIMILARITY. 


FT 


DISULFID 


1069 


1084 


BY SIMILARITY. 


FT 


DISULFID 


111 


127 


BY 


SIMILARITY. 


FT 


DISULFID 


1086 


1095 


BY SIMILARITY. 


FT 


DISULFID 


129 


138 


BY 


SIMILARITY. ■ 


FT 


DISULFID 


1102 


1123 


BY SIMILARITY, 


FT 


DISULFID 


144 


155 


BY 


SIMILARITY. 


FT 


DISULFID 


1117 


1132 


BY SIMILARITY. 


FT 


DISULFID 


149 


164 


BY 


SIMILARITY. 


FT 


DISULFID 


1134 


1143 


BY SIMILARITY. 


ft 


DISULFID 


166 


175 


BY 


SIMILARITY. 


FT 


DISULFID 


1150 


1161 


BY SIMILARITY. 




DISULFID 


182 


195 


BY 


SIMILARITY. 


FT 


DISULFID 


1155 


1170 


BY SIMILARITY. 




DISULFID 


189 


204 


BY 


SIMILARITY. 


FT 


DISULFID 


1172 


1181 


BY SIMILARITY. 


FT 


DISULFID 


206 


215 


BY 


SIMILARITY. 


FT 


DISULFID 


1188 


1199 


BY SIMILARITY. 


FT 


DISULFID 


222 


233 


BY 


SIMILARITY. 


FT 


DISULFID 


1193 


1208 


BY SIMILARITY. 


FT 


DISULFID 


227 


243 


BY 


SIMILARITY, 












FT 


DISULFID 


245 


254 


BY 


SIMILARITY, 


Note: remainder of annotations omitted. 


FT 


DISULFID 


261 


272 


BY 


SIMILARITY, 












FT 


DISULFID 


266 


281 


BY 


SIMILARITY. 


Query Match 




14.3%; 


Score 125; DB 1; Length 2444; 


FT 


DISULFID 


283 


292 


BY 


SIMILARITY, 


Best Local similarity 23.7*; 


Pred. No, 5.62e-09; 


FT 


DISULFID 


299 


312 


BY 


SIMILARITY, 


Matches 27 


Conservative 


32; Mismatches 46; Indels 9; Gaps 


FT 


DISULFID 


306 


321 


BY 


SIMILARITY. 












FT 


DISULFID 


323 


332 


BY 


SIMILARITY. 


Db 


575 CHYGSCK-DGVATFTCLCRPGYTGHHCETNINECSSQPCRLRGTCQDPDNAYLCFCLKGT 633 


FT 


DISULFID 


339 


350 


BY 


SIMILARITY, 




tl 1 


: 


llhll 


: 1 1 :: :| : : :: 1 1 :: 


FT 


DISULFID 


344 


359 


BY 


SIMILARITY, 


Qy 


4 CHHGQCHISDRGEPYCLCQPGFSGHHCEQE - NPCMGE - - IVREAI R - RQKDY AS - CAT AS 58 


FT 


DISULFID 


361 


370 


BY 


SIMILARITY. 












FT 


DISULFID 


376 


■ 387 


BY 


SIMILARITY. 


Db 


634 TGPNCEINLDDCASSPCDSGTCLDKIDGYECACEPGYTGSMCNSNIDECAGNPC 687 


FT 


DISULFID 


381 


398 


BY 


SIMILARITY. 




1 


1 




: ::|: ::: : ; :; II: :l 


FT 


DISULFID 


400 


409 


BY 


SIMILARITY. 


Qy 


59 KVPIMECR-GGCGTTCCQPIRSKRRKYVFQCTDGSSFVEEV-ERHL-ECGCRAC 109 


FT 


DISULFID 


416 


429 


BY 


SIMILARITY. 










FT 


DISULFID 


423 


438 


BY 


SIMILARITY. 












FT 


DISULFID 


440 


449 


BY 


SIMILARITY. 


RES 


JLT 9 








FT 


DISULFID 


456 


467 


BY 


SIMILARITY, 


ID 


NTC1 RAT 


STANDARD; 


PRT; 2531 AA. 


FT 


DISULFID 


461 


476 


BY 


SIMILARITY. 


AC 


Q07008; 








FT 


DISULFID 


478 


487 


BY 


SIMILARITY, 


DT 


01-NOV-1995 (REL. 32, CREATED) 


FT 


DISULFID 


494 


505 


BY 


SIMILARITY, 


DT 


01-NOV-1995 (REL. 32, LAST 


SEQUENCE UPDATE) 


FT 


DISULFID 


499 


514 


BY 


SIMILARITY, 


DT 


15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 


FT 


DISULFID 


516 


525 


BY 


SIMILARITY, 


DE 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR, 


FT 


DISULFID 


532 


543 


BY 


SIMILARITY, 


GN 


NOTCH1. 
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OS 


RATTUS NORVEGICUS (RAT) . 




FT 


DOMAIN 


1059 


1095 


PGP-TTIfP 5fl 


oc 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 




nnwiTH 

bvnnlfl 






PPP.TTVP 10 


oc 


RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS , 


FT 


nfiMHTM 
UUnlUW 


1145 


1181 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL), 


RN 


UJ 








FT 


DflMlTN 
UVMnlri 


1183 


1219 


PPP.TT7P 11 PATPTTTU-BTXTATH^ /nATimmT^T \ 

tbr-blKh ii., UUjUUM-fllNDlNb (PUIENTIAL) . 


RP 
RC 


SEQUENCE FROM N,A. 






UUMnlDI 






EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL), 


TISSUE-SCHWANN CELL; 




FT 




1267 


1305 


EGF-LIKE 33. 


RX 


MEDLINE; 92111383. 




FT 


DOMAIN 


1307 


1346 


PP,P-r,TKF 1A 

Dior blab j4 . 


RA 


WEINMASTER G., ROBERTS V.J., 


LEMKE G.; 




nnwiTH 


1348 




PPP-TTVP 


RT 


"A homolog of Dro 


ophila Notch expressed during mammalian 


FT 


UUWnlfl 


1387 


1426 


PPP-TTVP Id 

tbr LIRb Jb. 


RT 


development."; 






FT 




1449 


1462 


PVC .DTPU 

Lid KlLti, 


RL 


DEVELQPM 


NT 113:199-205(1991), 




UUnftlW 






D A ANK MUllr RhrtATo, 


CC 


-!- FUNCTION: REQt 


IRED FOR T 


HE CORRECT DIFFERENTIATION OF A NUMBER 


FT 


REPEAT 


1865 


1910 


ANK MOTIF 1. 


CC 


OF TISSUES. 






FT 


XvDrEjnl 


1912 


1942 


JMtf MAT TP 1 

AM MUllr i. 


CC 


-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 




RPPPAT 


1944 


1975 


AM MUllr j , 


CC 


•!• DEVELOPMENTAL STAGE: IN THE EMBRYO, HIGHEST LEVELS OCCUR BETWEEN 


FT 




1978 


2009 


AM MUllr 4 . 


CC 


DAYS 12 AND 14 AND DECREASE RAPIDLY TO MUCH LOWER LEVELS IN THE 


FT 


REPEAT 


2011 


2042 


AM MUllr J. 


CC 


ADULT. 






FT 


REPEAT 


2044 


2076 


AM MUllr D. 


CC 


-!• SIMILARITY: H 


GH, WITH OTHER NOTCH-TYPE PROTEINS. 


FT 


DISULFID 


24 


37 


RV QTMTTARTTV 
DI DlftlbnKU I . 


CC 


-!• SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 


FT 


UloUbr XU 


31 


46 


DV OTMTT ADTWV 
DI OlMlbAKlll. 


CC 


*!* SIMILARITY; CONTAINS 3 LIN/NOTCH REPEATS. 


FT 


DISULFID 


48 


57 


BY SIMILARITY, 


CC 


-!• SIMILARITY: CONTAINS 6 ANK REPEATS. 


FT 


DISULFID 


63 


74 


RV ^TMTT.ARTTV 
DI OlPLlbnftll I . 












FT 


DISULFID 


68 


87 


RV CTMTTSDTTV 
01 OlMlbAKlll. 


■ 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


W 

CC 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation - 




nTonr PTn 
uiaUJjf iu 


106 


117 


BY SIMILARITY. 


the European Bioinformatics Institute. There are no restrictions on its 


FT 


HTCTTT PTn 

UloUbrll) 


111 


127 


BY SIMILARITY. 


CC 


use by 


non-profit institutions as long as its content is in no way 


FT 


DISULFID 


129 


138 


RV CTMTT4DTTV 
01 OlMlbAKlll. 


CC 


modified 


and this statement is not removed, Usage by and for commercial 




nTCnr PTn 
UloULr ID 




155 


BY SIMILARITY. 


CC 


entities requires a license 


agreement (See http://www.isb-sib.ch/announce/ 


PT 


rtTonr PTn 

UloUbr LU 


149 




BY SIMILARITY. 


CC 


or send 


n email to hcense@isb-sib.ch), 


FT 


UiSULf LU 


166 




Hi oiMlbAKlli, 


CC 










pi 


mow V7T\ 
UloULr ID 


182 


195 


BY SIMILARITY. 


DR 


EMBL; X57405; G57635; -. 






nTCTTT PTn 

UIollbrlD 


189 


204 


BY SIMILARITY. 


DR 


PROSITE; PS00010; ASXJYDROXYL; 22, 


pi 


nTCTTT PTn 




ill 


BY SIMILARITY. 


DR 


PROSITE; PSQQQ22; 


EGF J; 35. 




PT 


UloUbr ID 


222 




BY SIMILARITY. 


DR 


PROSITE; PS01186; 


EGF_2; 26. 




FT 


nTOPT PTI> 
IJIDUbr 1U 


227 


A~ 


DV OTMTT RDTTV 

til SIMILARITY, 


DR 


PROSITE; PS01187; 


EGF_CA; 21. 


FT 


UloUbf 11) 






BY SIMILARITY. 


DR 


PFAM; PF 


0008; EGF; 35, 




FT 


nTCTTT PTn 

UloUbr ID 


261 


272 


BY SIMILARITY. 


DR 


PFAM; PF00023; ank; 6. 




FT 


DISULFID 


266 


281 


RV CTMTT STJTTV 
01 OlMlbAKlll. 


DR 


PFAM; PF00066; notch; 3. 




FT 


DISULFID 


283 


292 


DV CTMTT HDTTV 
01 OlMlbAKlll. 


DR 


HSSP; P00740; 1IXA. 






nTCTTT PTn 
UlDUbr lu 






BY SIMILARITY. 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; 1 ANK REPEAT; EGF-LIKE DOMAIN; 


PT 




306 


i?i 


BY SIMILARITY. 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 




nTCTTT PTPl 
UiSULf 1U 






BY SIMILARITY. 


FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


nTcrn PTn 


339 


350 


BY SIMILARITY. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 


FT 


DISULFID 


344 


359 


RV CTMTT SUTTV 
DI OlMlbAKlll. 


FT 


DOMAIN 


19 


1723 


EXTRACELLULAR (POTENTIAL). 


FT 


DISULFID 


361 


370 


DV CTMTT SUTTV 
01 OlMlbAKlll, 


FT 


TRANSMEM 


1724 


1746 


POTENTIAL. 


FT 


bioubr lu 


376 


387 


DV CTMTT IDTTiV 
01 OlMlbAKlll. 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT 


DISULFID 


381 


398 


BY SIMILARITY. 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DISULFID 


400 


409 


RV CTMTT.IDTTV 
01 DlMlbnKll I . 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2, 


FT 


DISULFID 


416 


429 


RV C TMTT,ARTTV 
oi oiraibArxii i . 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3. 


FT 


DISULFID 


423 


438 


BY SIMILARITY. 




DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DISULFID 


440 


449 


BY SIMILARITY. 


m 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


456 


467 


BY SIMILARITY. 


w 

FT 


DOMAIN 


218 


255 


EGF-LIKE 6, 


FT 


DISULFID 


461 


476 


BY SIMILARITY. 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


478 


487 


BY SIMILARITY. 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM- BINDING (POTENTIAL). 


FT 


DISULFID 


494 


505 


BY SIMILARITY. 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM- BINDING (POTENTIAL). 


FT 


DISULFID 


499 


514 


RV QTMTr ABTTV 
oi oinibftKii i . 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DISULFID 


516 


525 


DV CTMTT LDTTV 
OI OlftlbAMl 1 . 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


532 


543 


RV CTMTT BDTTV 
OI OlMlbAKlll. 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


537 


552 


DV CTMTr.lRTTV 
DI Oinibnnll I . 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


554 


563 


RV QTMTriBTTV 
DI OlMlbAKlll. 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DTcriT PTn 

JJlOUbf LU 


570 


580 


DV OTMTT H DTTV 

BK falMlLARITY, 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 




nTcnrpTn 

LflDUbr LU 


575 


589 


DV OTMTT RDTTiV 

Bl OlMlbAKlll. 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


titbit ,FTn 


591 


600 


DV CTMTT BDTTV 
DI OlMlbAKlll, 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 




nTcrTT.PTn 

UlOULf LU 


607 


618 


DV CTMTT RDTTV 

oi olMlLAKllI. 


FT 


DOMAIN 


678 
716 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


612 


627 


BY SIMILARITY. 


FT 


DOMAIN 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


629 


638 


BY SIMILARITY. 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


645 


655 


BY SIMILARITY. 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


650 


664 


BY SIMILARITY. 


FT 


DOMAIN 


829 


867 


EGF-LIKE 22. 


FT 


DISULFID 


666 


675 


BY SIMILARITY. 


FT 


DOMAIN 


869 
907 


905 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


682 


693 


BY SIMILARITY. 


FT 


DOMAIN 


943 


EGF-LIKE 24, 


FT 


DISULFID 


687 


702 


BY SIMILARITY. 


FT 


DOMAIN 


945 


981 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


704 


713 


BY SIMILARITY. 


FT 


DOMAIN 


983 


1019 


EGF-LIKE 26, 


FT 


DISULFID 


720 


730 


BY SIMILARITY. 


FT 


DOMAIN 


1021 


1057 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). . 


FT 


DISULFID 


725 


739 


BY SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

•DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

•DISULFID 
DISULFID 
DISULFID 



844 
855 



893 
904 
922 
931 
942 



741 750 

757 768 

762 777 

779 788 

795 806 

800 815 

817 

833 

838 

857 

873 

878 

895 

911 

916 

933 

987 

992 1007 

1009 1018 

1025 1036 

1030 1045 

1047 1056 

1063 1074 

1068 1083 

1085 1094 

1101 1122 

1116 1131 

1133 1142 

1149 1160 

1154 1169 

1171 1180 

1187 1198 

1192 1207 

1209 1218 

1225 1244 

1238 1253 

1255 1264 

1271 1284 

1276 1293 

1295 1304 

1311 1322 

1316 1334 

1336 1345 

1352 1363 

1357 1372 

1374 1383 



BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. ' 



Note: remainder of annotations omitted. 

Query Match 14.3*; Score 125; DB 1; Length 2531; 

Best Local Similarity 25,4*; Pred. No. 5.62e-09; 

Matches 30; Conservative 31; Mismatches 45; Indels 12; Gaps 11; 

Db 570 CDPDPCHIGLCKDGVATFTCLCQPGYTGHHCETNINECHSQPCRHGGTCQDRDNYYLCLC 629 

I Ml: : I : 1 1 1 1 1 1 :: 1 1 1 1 1 : I I :: I : :::| I I 
Qy 4 CHHGQCHISD-R-G-EPY-CLCQPGFSGHHCEQE-NPCMGEIVREA-IRRQKD--YAS-C 54 

Db 630 LKGTTGPNCEINLDDCASNPCDSGTCLDKIDGYECACEPGYTGSMCNVNIDECAGSPC 687 

:: I I I:: |:: : ::|: ::: : : :: ||: :| 
Qy 55 ATASKVPIMECR-GGCGTTCCOPIRSKRRKYVFQCTDGSSFVEEV-ERHL-ECGCRAC 109 



RESULT 10 

ID NQTC_DROME STANDARD; PRT; 2703 AA. 

AC P07207; P04154; 

DT 01-NOV-1986 (REL. 03, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN PRECURSOR. 

GN N. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 



PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

DROSOPHILIDAE; DROSOPHILA. 

[1] 

SEQUENCE FROM N.A. 
MEDLINE; 86079539. 

WHARTON K.A., JOHANSEN K.M., XU T., ARTAVANIS-TSAKONAS S.; 
"Nucleotide sequence from the neurogenic locus notch implies a gene 
product that shares homology with proteins (containing EGF-like 
repeats."; 

CELL 43:567-581(1985), 
[2] 

SEQUENCE FROM N.A. 

STRAIN-OREGON-R; 

MEDLINE; 87064624. 

KIDD S., KELLEY M.R., YOUNG M.W.; 

"Sequence of the notch locus of Drosophila melanogaster: relationship 
of the encoded protein to mammalian clotting and growth factors."; 
MOL. CELL. BIOL. 6:3094-3108(1986). 
[3] 

SEQUENCE OF 2505-2611 FROM N.A. 
MEDLINE; 85099329. 

WHARTON K.A., YEDVOBNICK B., FINNERTY V.G., ARTAVANIS-TSAKONAS S',; 
"opa: a novel family of transcribed repeats shared by the Notch locus 
and other developmentally regulated loci in D. melanogaster."; 
CELL 40:55-62(1985). 
[4] 

SEQUENCE OF 1-8 FROM N.A. 
MEDLINE; 87257846. 

KELLEY M.R., KIDD S,, BERG R.L., YOUNG M.W.; 

"Restriction of P-element insertions at the Notch locus of Drosophila 



\ DIFFERENTIATION OF 



MOL. CELL, BIOL, 7:1545-1548(1987). 
[5] 

REVIEW. 
HARRIS W.A.; 

"Many cell types specified by Notch function." 
CURR. BIOL. 1:120-122(1991). 
-!- FUNCTION: NOTCH PROTEIN IS ESSENTIAL FOR E 
ECTODERM. 

-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

■I- SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 
OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 
THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 

■!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

-!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

-I- SIMILARITY; CONTAINS 3 LIN/NOTCH REPEATS. 

•!■ SIMILARITY: CONTAINS 6 ANK REPEATS, 

This SWISS-PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation * 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseSisb-sib.ch). 

EMBL; M16152; G157988; -. 
EMBL; M16153; G157988; JOINED. 
EMBL; M16149; G157988; JOINED, 
EMBL; M16150; G157988; JOINED. 
EMBL; M16151; G157988; JOINED. 
EMBL; K03508; G157993; -. 
EMBL; M13689; G157993; JOINED. 
EMBL; K03507; G157993; JOINED. 
EMBL; M12175; G950317; -. 
EMBL; M16025; G157995; -. 
PIR; A24420; A24420, 
PIR; A24768; A24768, 
PIR; A05267; A05267, 
FLYBASE; FBgn0004647; N. 
PROSITE; PS00010; ASXJYDROXYL; 22. 
PROSITE; PS00022; EGF_1; 34. 
PROSITE; PS01186; EGF_2; 28, 
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DR PROSITE; PS01187; EGF_CA; 22. 

DR PFAM; PF00008; EGF; 36. 

DR PFAM; PF00023; ank; 6, 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P0Q740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 



FT 


SIGNAL 


1 


44 


POTENTIAL. 


FT 


CHAIN 


45 


2703 


NEUROGENIC LOCUS NOTCH PROTEIN. 


FT 


DOMAIN 


45 


1745 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1746 


1766 


POTENTIAL. 


FT 


DOMAIN 


1767 


2703 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


58 


1451 


36 X EGF-TYPE REPEATS, 


FT 


DOMAIN 


58 


95 


EGF-LIKE 1. 


FT 


DOMAIN 


96 


136 


EGF-LIKE 2, 


FT 


DOMAIN 


139 


176 


EGF-LIKE 3, 


FT 


DOMAIN 


177 


215 


EGF-LIKE 4. 


FT 


DOMAIN 


217 


253 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


255 


291 


EGF-LIKE 6, 


FT 


DOMAIN 


293 


329 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


331 


370 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL), 


1 


DOMAIN 


372 


408 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 




DOMAIN • 


409 


447 


EGF-LIKE 10. 


V 


DOMAIN 


449 


486 


EGF-LIKE 11/ CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


488 


524 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


526 


562 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


564 


600 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


602 


637 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


639 


675 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


677 


713 


FGF-T,TKP 17 C AirTnV-RTNnTWf f DfYTFNTTAT \ 
CAjr blMi 11/ UUiUlUn DIHUINu (rUlMlllALJ 


FT 


DOMAIN 


715 


751 


Cut blMj 10, LfllitlUM DlHUlHu (rUlMUlAli) 




DOMAIN 


753 


789 


bbt UlKl 15, IALU1UM BINDING (FUIttlllAL) 


FT 


nAMlTH 
UUnnlN 


791 


827 


EGF'LIKE 20, CALCIUM-BINDING (POTENTIAL) 


FT 


DOMAIN 


829 


865 


PfT-TTVP 01 PUT PTrTV-DTMrMMf / [VYiirimTlIT \ 

ike hlhb li, LALIIUM BINDINb (rUihNIIAL) 


FT 


DOMAIN 


867 


905 


Lor It , 




UUnnlW 






EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL) 


FT 


UUHA1N 




CM 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL) 


J 


DOMAIN 


984 


1020 


EGF-LIKE 25. 


J 


DOMAIN 




1058 


EGF-LIKE 26/ CALCIUM-BINDING (POTENTIAL) 


nl 


UUMAIN 


1060 


1096 


EGF-LIKE 27, 


J 


DOMAIN 


Ins 




EGF-LIKE 28, 




DOMAIN 




mi 


EGF-LIKE 29, 


FT 


UUMAIN 




1219 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL) 


FT 


UUMAIN 






EGF'LIKE 31, CALCIUM-BINDING (POTENTIAL) 


FT 


UUMAIN 




1295 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL) 


FT 


UUnnlN 


nQ7 




prr-i'Tin? 
tibr blKL ii. 




DOMAIN 


ni? 




but LIK.h j 4 . 




UUnnlN 






EGF'LIKE 35. 


FT 


UUMAIN 


UK 




but LlNfc JO. 


A 




1475 


1593 


j A JjlN/NUlLii KLrfiAlo. 


P 


REPEAT 


1475 


1513 


T TM /MnTPU 1 

uih/nuh,h i, 




REPEAT 


1514 


1553 


Lilly NUlLil L , 


FT 


REPEAT 


1554 


1593 


Lilly NUlLtl j , 




uunnin 


1896 




6 X ANK MOTIF REPEATS. 


FT 


uunnin 


2538 


2568 


rUbl uLN (UFA KbrbAl). 


FT 


DISULFID 


62 


73 


DV CTWTT ADTTV 


FT 


DISULFID 


67 


83 


BY SIMILARITY, 


FT 


DISULFID 


85 


94 


BY SIMILARITY! 


FT 


DISULFID 


100 


111 


BY SIMILARITY. 


FT 


DISULFID 


105 


124 


BY SIMILARITY, 


FT 


DISULFID 


126 


135 


BY SIMILARITY, 


FT 


DISULFID 


143 


154 


BY SIMILARITY. 


FT 


DISULFID 


148 


164 


BY SIMILARITY. 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 


FT 


DISULFID 


181 


192 


BY SIMILARITY. 


FT 


DISULFID 


186 


203 


BY SIMILARITY. 


FT 


DISULFID 


205 


214 


BY SIMILARITY, 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 


FT 


DISULFID 


226 


241 


BY SIMILARITY. 


FT 


DISULFID 


243 


252 


BY SIMILARITY. 


FT 


DISULFID 


259 


270 


BY SIMILARITY. 


FT 


DISULFID 


264 


279 


BY SIMILARITY. 


FT 


DISULFID 


281 


290 


BY SIMILARITY. 





HTCnrPTri 


297 




BY 


SIMILARITY. 




nTCnrPTn 

UlOULf iu 




317 


BY 


oIMIIiAKIli , 


J 


DlbULHD 


319 


328 


BY 


SIMILARITY. 




LUoULt iu 






BY 


SIMILARITY, 


pi 


HTCrTT PTH 

LUoULr 1U 


343 


358 


BY 


SIMILARITY, 


FT 


DISULFID 


360 


369 


BY 


SIMILARITY. 




HTOfTT PTn 

UlollLi ID 


376 


387 


BY 


SIMILARITY, 


PT 


nTcnrPTn 
LUbULt IU 


381 


396 


BY 


SIMILARITY. 


FT 


DISULFID 


398 


407 


BY 


SIMILARITY. 


FT 


DISULFID 


413 


424 


BY 


SIMILARITY. 


FT 


DISULFID 


418 


435 


BY 


SIMILARITY. 


FT 


DISULFID 


437 


446 


BY 


SIMILARITY. 


FT 


DISULFID 


453 


465 


BY 


SIMILARITY. 


FT 


DISULFID 


459 


474 


BY 


SIMILARITY, 


FT 


DISULFID 


476 


485 


BY 


SIMILARITY. 


FT 


DISULFID 


492 


503 


BY 


SIMILARITY. 


FT 


DISULFID 


497 


512 


BY 


SIMILARITY. 


FT 


DISULFID 


514 


523 


BY 


SIMILARITY, 


FT 


DISULFID 


530 


541 


BY 


SIMILARITY. 


FT 


UloULt ID 


535 


550 


BY 


SIMILARITY. 


FT 


DISULFID 


552 


561 


BY 


SIMILARITY. 


J 


DISULFID 


568 


579 


BY 


SIMILARITY. 




HTCTTT Pin 

UloULr IU 






BY 


SIMILARITY. 


PT 


nTCnrPTn 
UloULr IU 


son 


<;qq 




oIMILAKIu, 




UlsULr ID 


fins 




nv 
BY 


SIMILARITY, 


FT 


rsTCriT pth 

LUSULt ID 


611 


625 




oIMILARIli . 




uiauLt iu 


627 


636 


BV 


CTWTT RPTT'V 

oIMILAKIIi , 


FT 


nTcnrPTn 
LUSUM iu 






nv 


CTWTT BDTHIV 

oIMIIiAKIli . 


FT 


LUDUlJf 1U 


648 


663 


nv 


OTWTT &DTTV 




LUoULt iu 


665 




nv 
BY 


oIMILAKIIi . 


PT 


LUoULr IU 




692 


BY 


SIMILARITY. 


FT 


uiouur iu 


686 


701 


BY 


OTWTT ftDTTV 

oIMILAKIIi . 


FT 
pi 


DIoULrlD 


703 


712 


BY 


SIMILARITY. 




HTCnT PTH 

DloULrlD 


719 


730 


BY 


SIMILARITY. 


™ 


DISULFID 


724 


739 


BY 


SIMILARITY, 


FT 


uiouur iu 


741 


750 




CTWTT BDTTV 


FT 


DISULFID 


757 


768 


BY 


SIMILARITY. 


FT 


DISULFID 


762 


777 


BY 


SIMILARITY. 


FT 


DISULFID 


779 


788 


BY 


SIMILARITY. 


FT 


DISULFID 


795 


806 


BY 


SIMILARITY. 


FT 


DISULFID 


800 


815 


BY 


SIMILARITY. 


ft' 


DISULFID 


817 


826 


BY 


SIMILARITY. 


FT 


DISULFID 


833 


844 


BY 


SIMILARITY. 


FT 


DISULFID 


838 


853 


BY 


SIMILARITY. 


FT 


DISULFID 


855 


864 


BY 


SIMILARITY. 



Note: remainder of annotations omitted, 



Query Match 14,0*; Score 122; DB 1; Length 2703; 

Best Local Similarity 40.54; Pred. No. 2,23e-08; 

Matches 17; Conservative 9; Mismatches 11; mdels 5; Gaps 5; 

Db 952 SFPCQNGGTC-LDGIGD-YSCLCVDGFDGKHCETDINECLSQ 991 

:! I::| I : hi III II III : I |::: 
Qy 1 AFKCHHG-QCHISDRGEPY-CLCQPGFSGHHCEQE-NPCMGE 39 



RESULT 11 

ID NTC4J10USE STANDARD; PRT; 1964 AA. 

AC P31695; 062389; 

DT 01-JUL-1993 (REL, 26, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4 PRECURSOR (TRANSFORMING 

DE PROTEIN INT-3). 

GN N0TCH4 OR INT3 OR INT-3. 

OS MUS MUSCULUS (MOUSE), 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI ; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 92194507. 

RA ROBBINS J., BLONDEL B.J., GALLAHAN D., CALLAHAN R,; 
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RT 


"Mouse mammary tumor gene int-3: a member of the notch gene family 


FT 


DOMAIN 


964 


1000 


EGF-LIKE 25. 


RT 


transforms mammary epithelial cells."; 


FT 


DOMAIN 


1002 


1040 


EGF-LIKE 26. 


RL 


J. VIROL. 66:2594-2599(1992), 


FT 


DOMAIN 


1042 


1081 


EGF-LIKE 27. 


RN 


[2] 






FT 


DOMAIN 


1083 


1122 


EGF-LIKE 28. 


RP 


REVISIONS, SEQU 


NCE FROM N.A. 


FT 


DOMAIN 


1126 


1167 


EGF-LIKE 29. 


RA 


CALLAHAN 


R.; 




FT 


DOMAIN 


1168 


1282 


3 X LIN/NOTCH REPEATS, 


RL 


SUBMITTED (NOV-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 


FT 


REPEAT 


1168 


1208 


LIN/NOTCH 1. 


RN 


(3] 






FT 


REPEAT 


1209 


1242 


LIN/NOTCH 2, 


RP 


SEQUENCE 


FROM N.A, 


FT 


REPEAT 


1243 


1282 


LIN/NOTCH 3. 


RC 


TISSUE=LUNG, AND TESTIS; 


FT 


DOMAIN 


1572 


1785 


6 X ANK MOTIF REPEATS, 


RX 


MEDLINE; 


96281668. 


FT 


REPEAT 


1572 


1603 


ANK MOTIF 1. 


RA 


UYTTENDAELE H., 


MARAZZI G., TOG., YAN Q., SASSOON D., KITAJEWSKI J.; 


FT 


REPEAT 


1622 


1653 


ANK MOTIF 2. 


RT 


"Notch4/int-3, a mammary proto-oncogene, is an endothelial 


FT 


REPEAT 


1654 


1685 


ANK MOTIF 3, 


RT 


cell-specific mammalian Notch gene,"; 


FT 


REPEAT 


1688 


1719 


ANK MOTIF 4. 


RL 


DEVELOPMENT 122:2251-2259(1996). 


FT 


REPEAT 


1721 


1752 


ANK MOTIF 5. 


CC 


■!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 


FT 


REPEAT 


1754 


1785 


ANK MOTIF 6, 


I 


-!- DISEASE: ACTIVATED INT-3 TRANSFORMS MAMMARY EPITHELIAL CELLS. 


FT 


DISULFID 


25 


38 


BY SIMILARITY, 


■ 


-!- SIMILARITY; CONTAINS 29 EGF-LIKE DOMAINS, 


FT 


DISULFID 


32 


48 


BY SIMILARITY, 


w 


-!- SIMILARITY: 


CONTAINS 3 LIN/NOTCH REPEATS. 


FT 


DISULFID 


50 


59 


BY SIMILARITY, 


CC 


■!* SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 


FT 


DISULFID 


65 


77 


BY SIMILARITY, 


CC 


-!- SIMILARITY: 


CONTAINS 6 ANK REPEATS, 


FT 


DISULFID 


71 


100 


BY SIMILARITY, 


CC 








FT 


DISULFID 


102 


111 


BY SIMILARITY. 


CC 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


FT 


DISULFID 


119 


130 


BY SIMILARITY. 


CC 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation ■ 


FT 


DISULFID 


124 


140 


BY SIMILARITY. 


CC 


the Euro? 


ean Bioinformatics Institute, There are no restrictions on its 


FT 


DISULFID 


142 


151 


BY SIMILARITY, 


CC 


use by 


non-profit institutions as long as its content is in no way 


FT 


DISULFID 


157 


168 


BY SIMILARITY, 


CC 


modified 


and this statement is not removed. Usage by and for commercial 


FT 


DISULFID 


162 


177 


BY SIMILARITY, 


CC 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


FT 


DISULFID 


179 


188 


BY SIMILARITY. 


CC 


or send an email to license@isb-sib.ch). 


FT 


DISULFID 
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BY SIMILARITY. 
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DISULFID 
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BY SIMILARITY. 


DR 


EMBL; M80456; G1714084; -. 


FT 


DISULFID 


219 


228 


BY SIMILARITY. 


DR 


EMBL; 043691; G1401160; -. 


FT 


DISULFID 
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246 


BY SIMILARITY. 


DR 


PIR; A38072; TVMVT3, 


FT 


DISULFID 


240 


259 


BY SIMILARITY. 


DR 


MGD; MGI : 107471 ; NOTCH4. 
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DISULFID 
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270 


BY SIMILARITY. 


DR 


PROSITE; PS00010; ASXJYDROXYL; 11, 


FT 


DISULFID 


235 


246 


BY SIMILARITY. 


DR 


PROSITE; PS00022; EGF_1; 28. 
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DISULFID 
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259 


BY SIMILARITY, 


DR 


PROSITE; PS01186; EGFJ; 21, 


FT 


DISULFID 


261 


270 


BY SIMILARITY, 


DR 


PROSITE; PS01187; EGF CA; 9. 


FT 


DISULFID 
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288 


BY SIMILARITY, 


DR 


PFAM; PF 


0008; EGF; 26. 
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DISULFID 
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BY SIMILARITY, 


DR 


PFAM; PF00023; ank; 6, 


FT 


DISULFID 


299 


308 


BY SIMILARITY. 


DR 


PFAM; PF00066; . notch; 2. 


FT 


DISULFID 


315 


329 


BY SIMILARITY. 


DR 


HSSP; P00740; 1IXA. 


FT 


DISULFID 
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338 


BY SIMILARITY. 


KW 


DIFFERENTIATION 


NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 


FT 


DISULFID 


340 


349 


BY SIMILARITY. 


KW 


GLYCOPROTEIN; PROTO-ONCOGENE; ANK REPEAT; SIGNAL. 


FT 


DISULFID 
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367 


BY SIMILARITY. 


rr 


SIGNAL 


1 


20 POTENTIAL, 
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DISULFID 
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376 


BY SIMILARITY. 


1 


CHAIN 


21 


1964 NEUROGENIC LOCUS NOTCH HOM0L0G PROTEIN 4. 


FT 


DISULFID 


378 


387 


BY SIMILARITY. 


1 


DOMAIN 


21 


1443 EXTRACELLULAR (POTENTIAL). 
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DISULFID 
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404 


BY SIMILARITY. 


n 


TRANSMEM 


1444 


1464 POTENTIAL. 
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DISULFID 
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415 


BY SIMILARITY. 
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DOMAIN 


1465 


1964 CYTOPLASMIC (POTENTIAL). 


FT 


DISULFID 


417 


426 


BY SIMILARITY, 


FT 


DOMAIN 


21 


60 EGF-LIKE 1. 
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BY SIMILARITY. 
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112 EGF-LIKE 2. 
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BY SIMILARITY. 
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152 EGF-LIKE 3. 
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BY SIMILARITY. 


FT 


DOMAIN 


153 


189 EGF-LIKE 4. 
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BY SIMILARITY. 


FT 


DOMAIN 


191 


229 EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 
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BY SIMILARITY. 
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271 EGF-LIKE 6. 
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BY SIMILARITY. 
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309 EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 
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BY SIMILARITY. 
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350 EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 
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BY SIMILARITY. 
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388 EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 
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BY SIMILARITY. 
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427 EGF-LIKE 10. 
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BY SIMILARITY. 
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470 EGF-LIKE 11, CALCIUM- BINDING (POTENTIAL), 
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BY SIMILARITY. 
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508 EGF-LIKE 12, CALCIUM- BINDING (POTENTIAL). 
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BY SIMILARITY. 
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546 EGF-LIKE 13, CALCIUM- BINDING (POTENTIAL), 
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BY SIMILARITY. 
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584 EGF-LIKE 14, CALCIUM- BINDING (POTENTIAL) . 
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BY SIMILARITY. 


FT 


DOMAIN 


586 


622 EGF-LIKE 15, CALCIUM- BINDING (POTENTIAL), 
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BY SIMILARITY. 
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656 EGF-LIKE 16. 
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BY SIMILARITY. 
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686 EGF-LIKE 17, 
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BY SIMILARITY. 
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724 EGF-LIKE 18, 
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BY SIMILARITY. 
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762 EGF-LIKE 19. 
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BY SIMILARITY. 
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800 EGF-LIKE 20. 
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674 


BY SIMILARITY. 
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839 EGF-LIKE 21. 
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685 


BY SIMILARITY. 
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841 


877 EGF-LIKE 22. 
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DISULFID 
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703 


BY SIMILARITY. 


FT 


DOMAIN 


878 


924 EGF-LIKE 23. 
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DISULFID 
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712 


BY SIMILARITY. 


FT 


DOMAIN 


926 


962 EGF-LIKE 24, 


FT 


DISULFID 
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BY SIMILARITY. 
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DISULFID 
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FT DISULFID 

FT DISULFID 
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829 

845 

850 

867 

882 
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914 

930 

935 

952 

968 

973 988 

990 999 
1006 1019 
1011 1028 
1030 1039 
1046 1057 
1051 1069 
1071 1080 
1087 1098 
1092 1110 
1112 1121 
1130 1142 
1136 1155 
1157 1166 

711 711 

960 960 
1139 1139 



741 
750 
761 
779 
788 
799 
818 
827 
838 
856 
865 
876 
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912 
923 
941 
950 
961 
979 



BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
POTENTIAL, 
POTENTIAL. 
POTENTIAL. 
O •> R (IN REF. 3). 
L ■> P (IN REF. 3). 
M -> K (IN REF. 3), 



Note: remainder of annotations omitted. 

Query Match 13.9%; Score 121; DB 1; Length 1964; 

Best Local Similarity 40.5%; Pred. No, 3,52e-08; 

Matches 15; Conservative 8; Mismatches 13; Indels 1; Gaps 1; 

Db 443 CEHGGSCINTPGSFNCLCLPGYTGSRCEADHNECLSQ 479 

III hi III I:: :|| : | |::: 
£y 4 CHHGQCHISDRGEPYCLCQPGFSGHHCEQE-NPCMGE 39 
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ISULT 12 

ID NOTCJENLA STANDARD; PRT; 2524 AA. 

AC P21783; 

DT 01-MAY-1991 (REL. 18, CREATED) 

DT 01-OCT-1996 (REL. 34, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG PRECURSOR (XOTCH PROTEIN). 

GN XOTCH. 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 90385285. 

RA COFFMAN C, HARRIS W., KINTNER C; 

RT "Xotch, the Xenopus homolog of Drosophila notch,"; 

RL SCIENCE 249:1438-1441(1990). 

RN (2] 

RP REVISIONS TO 1759-1782. 

RA KINTNER C; 

RL SUBMITTED (JUN-1996) TO EMBL/G ENBANK/DDB J DATA BANKS. 

CC -!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 



-!- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. . 
-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 
-!• SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 
-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 
-!- SIMILARITY: CONTAINS 6 ANK REPEATS, 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license?isb-sib.ch). 



DR EMBL; M33874; G1364263; -. 

DR PIR; A35844; A35844. 

DR PROSITE; PS00010; ASXJYDROXYL; 23, 

DR PROSITE; PS00022; EGF_1; 34. 

DR PROSITE; PS01186; EGF_2; 29. 

DR PROSITE; PS01187; EGF CA; 21. 

DR PFAM; PF00008; EGF; 36. 

DR PFAM; PF00023; ank; 6, 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 



J™ 


SIGNAL 




19 


POTENTIAL. 


FT 


CHAIN 


in 
20 


2524 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG, 


FT 


DOMAIN 


20 


1728 


EXTRACELLULAR (POTENTIAL). 
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TRANSMEM 


1729 


1750 


POTENTIAL. 


FT 


DOMAIN 


1751 


2524 


CYTOPLASMIC (POTENTIAL). 
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DOMAIN 


20 


57 


EGF-LIKE 1. 


FT 


DOMAIN 


58 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


140 


EGF-LIKE 3. 


! T 


DOMAIN 


141 


177 


EGF-LIKE 4. 


FT 


DOMAIN 


179 


215 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


217 


254 


EGF-LIKE 6. 




DUMAJ.N 




292 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


294 


332 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 
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371 


409 


EGF-LIKE 10, 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


489 


525 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


527 


563 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


565 


600 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


602 


638 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


640 


675 


EGF-LIKE 17. 


FT 


DOMAIN 


677 


713 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


715 


750 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


752 


788 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


790 


826 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 
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866 


EGF-LIKE 22. 


FT 


DOMAIN 


868 


904 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 
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942 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 
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DOMAIN 


944 
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EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 
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1018 


EGF-LIKE 26. 
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1056 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 
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1094 


EGF-LIKE 28. 
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EGF-LIKE 29. 
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EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 
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1218 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 33. 
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1346 


EGF-LIKE 34, 
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EGF-LIKE 35. 
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EGF-LIKE 36. 
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3 X LIN/NOTCH REPEATS. 
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LIN/NOTCH 1, 
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1520 


LIN/NOTCH 2, 
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REPEAT 
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1560 


LIN/NOTCH 3. 
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2083 


6 X ANK MOTIF REPEATS. 
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BY SIMILARITY. 



Tue Jun 1 10:15:58 1999 



US-09-191-647-ll,rsp 



Page 



FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT 'DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

# DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

« DISULFID 
DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 



29 
47 
62 
68 
89 
106 
111 
130 
145 
150 
167 
183 
188 
205 
221 
226 
244 
260 
265 
282 
298 
305 
322 
338 
343 
360 
375 
380 
399 
415 
422 
439 
455 
460 
477 
493 
498 
515 
531 
536 
553 
569 
574 
590 
606 
611 
628 
644 
649 
665 
681 
686 
703 
719 
724 
740 
756 
761 
778 
794 
799 
816 
832 
837 
856 
872 
877 
894 
910 
915 
932 
986 



139 
156 
165 
176 
194 
203 
214 
232 
242 
253 
271 
280 
291 
311 
320 
331 
349 
358 
369 
386 
397 
408 
428 
437 
448 
466 
475 
486 
504 
513 
524 
542 
551 
562 
579 
588 
599 
617 
626 
637 
654 
663 
674 
692 
701 
712 
729 
738 
749 
767 
776 
787 
805 
814 
825 
843 
854 
865 
883 
892 
903 
921 
930 
941 
997 



BY SIMILARITY. 
BY SIMILARITY. 
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BY SIMILARITY, 
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BY SIMILARITY, 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY, 
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BY SIMILARITY. 
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POTENTIAL. 
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POTENTIAL. 



Note: remainder of annotations omitted. 

Query Match 13.9%; Score 121; DB 1; Length 2524; 

Best Local Similarity 45.0%; Pred. No. 3.52e-08; 

Matches 18; Conservative 8; Mismatches 9; mdels 5; Gaps ! 
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PGBMJUMAN STANDARD; PRT; 4393 AA. 
P98160; Q16287; 

01-QCT-1996 (REL. 34, CREATED) 

01-OCM996 (REL. 34, LAST SEQUENCE UPDATE) 

15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

BASEMENT MEMBRANE-SPECIFIC HEPARAN SULFATE PROTEOGLYCAN CORE 

PROTEIN PRECURSOR (HSPG) (PERLECAN) (PLC). 

HSPG2. 

HOMO SAPIENS (HUMAN) . 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRA! A; MAMMALIA; EUTHERIA; 

PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

[1] 

SEQUENCE FROM N.A. 
MEDLINE; 92112994, 
KALLUNKI P., TRYGGVASON K. ; 

"Human basement membrane heparan sulfate proteoglycan core protein: a 
467 -kD protein containing multiple domains resembling elements of the 
low density lipoprotein receptor, laminin, neural cell adhesion 
molecules, and epidermal growth factor."; 
J. CELL BIOL. 116:559-571(1992). 
[2] 

SEQUENCE FROM N.A. 
TISSUE-SKIN, AND COLON; 
MEDLINE; 92235084. 

MURDOCH A.D., DODGE G.R., COHEN I., TUAN R.S., IOZZO R.V.; 
"Primary structure of the human heparan sulfate proteoglycan from 
basement membrane (HSPG2/perlecan) , A chimeric molecule with multiple 
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RT 


domains homologous to the low density lipoprotein receptor, laminin, 


DR 


PFAM; PF00054; lamininj; 3. 




RT 


neural cell adhesion molecules, and epidermal growth factor."; 


DR 


PFAM; PF00057; ldl_recept_a; 


4. 


RL 


J. BIOL. CHEM. 267:8544-8557(1992). 


DR 


HSSP; P00740; 1IXA. 




RN 


[3] 


KW 


SIGNAL; E 


ASEMENT 


MEMBRANE; PROTEOGLYCAN; REPEAT; GLYCOPROTEIN; 


RP 


SEQUENCE OF 1018-1472 FROM N.A. 


KW 


HEPARAN SULFATE; LAMININ EGF-LIKE DOMAIN; IMMUNOGLOBULIN FOLD; 


RC 


TISSUE-COLON; 


KW 


EXTRACELLULAR MATRIX; EGF-LIKE DOMAIN. 


RX 


MEDLINE; 91365376. 


FT 


SIGNAL 


1 


21 


POTENTIAL. 


RA 


DODGE G.R., KOVALSZKY I., CHU M.L., HASSELL J.R., MCBRIDE O.W., 


FT 


CHAIN 


22 


4393 


BASEMENT MEMBRANE-SPECIFIC HEPARAN 


RA 


YI H.F., IOZZO R.V.; 


FT 








SULFATE PROTEOGLYCAN CORE PROTEIN, 


RT 


"Heparan sulfate proteoglycan of human colon: partial molecular 


FT 


DOMAIN 


22 


193 


DOMAIN I (UNIQUE, CONTAINS 3 HS SIDE 


RT 


cloning, cellular expression, and mapping of the gene (HSPG2) to the 


FT 








CHAINS) . 


RT 


short arm of human chromosome 1."; 


FT 


DOMAIN 


194 


404 


DOMAIN II (4 LDLRA REPEATS). 


RL 


GENOMICS 10:673-680(1991). 


FT 


DOMAIN 


405 


506 


DOMAIN II A (1 IGG-REPEAT). 


RN 


[4] 


FT 


DOMAIN 


507 


1678 


DOMAIN III (SIMILAR TO SHORT ARM OF 


RP 


SEQUENCE OF 892-1398 FROM N.A. 


FT 








T 1MTMTM A rUSTvn 
LAMININ A LtlnlNJ, 


RC 


TISSUE=FIBROSARCOMA; 


FT 


DOMAIN 


1679 


3688 


DOMAIN IV (SIMILAR TO NEURAL CELL 


RX 


MEDLINE; 92120660. 


FT 








ADHESION MOLECULE; 21 IGG REPEATS) . 


RA 


KALLUNKI P., EDDY R.L., BYERS M.G., KESTILA M., SHOWS T.B., 


FT 


DUMAIN 


3689 


4393 


DOMAIN V (C'TERMINAL G'DOMAIN OF LAMININ 


RA 


TRYGGVASON K. ; 


FT 








ALPHA CHAINS AND EGF) , 


RT 


"Cloning of human heparan sulfate proteoglycan core protein, 


FT 


DOMAIN 


197 


236 


LDL-RECEPTOR CLASS A 1. 


JL 


assignment of the gene (HSPG2) to Ip36.1->p35 and identification of 


FT 


lAIMnlH 


283 


321 


LUL KLULfiUK Ibfloo A I. 


m 


a BamHl restriction fragment length polymorphism,"; 


FT 


DUMAIN 






LDL-RtCbPTOK LLAao A J. 


W 


GENOMICS 11:389-396(1991). 


FT 


1Y1MATN 


366 


405 


uUu KLLLrlUK LLflSi ft 4 . 


m 


[5] 


FT 


IAMA1IN 




506 


T^.TTVP M.TVDP IVWHTM 1 

1\3 LlftL \,i lirt LKJMAIN 1. 


RP 


SEQUENCE OF 1-21 FROM N.A. 


FT 


1JVMA1N 






LAMININ Ujc uLKt 1 (N ltKMINAb) . 


RX 


MEDLINE; 94052171. 


FT 




«i 




T RUTMTM TVUTRTM TU 1 /1M"IUJ\TM TTT M 

LAMININ DUMAIN IV 1 (DUMAIN III A) . 


RA 


COHEN I.R., GRAESSEL S., MURDOCH A.D., IOZZO R.V.; 


FT 


DOMAIN 


ill 
733 


765 


LAMININ EGF-LIKE 1 (C-TERMINAL) . 


RT 


"Structural characterization of the complete human perlecan gene and 


FT 


DUMA 1 a 






LAMININ EGF'LIKE 2. 


RT 


its promoter."; 


FT 


UVMnlH 


816 


873 


LnMlWlN fibr LI Mi i, 


RL 


PROC. NATL. ACAD. SCI. U.S.A. 90:10404-10408(1993). 


FT 


DOMAIN 


881 


925 


LAMININ EGF-LIKE 4 (INCOMPLETE). 


CC 


-!• FUNCTION: THIS PROTEIN IS AN INTEGRAL COMPONENT OF BASEMENT 


FT 


DOMAIN 


926 


935 


LAMININ EGF-LIKE 5 (N-TERMINAL) . 


CC 


MEMBRANES, IT IS RESPONSIBLE FOR THE FIXED NEGATIVE ELECTROSTATIC 


FT 








LAMININ DUMAIN IV i (DUMAIN III B) . 


CC 


CHARGE AND IS INVOLVED IN THE CHARGE-SELECTIVE ULTRAFILTRATION 


FT 


DOMAIN 


1128 


1160 


LAMININ EGF-LIKE 5 (C'TERMINAL) . 


CC 


PROPERTIES. IT INTERACTS WITH OTHER BASEMENT MEMBRANE COMPONENTS 


FT 


DOMAIN 






LAMININ EGF-LIKE 6. 


CC 


SUCH AS LAMININ AND COLLAGEN TYPE IV AND SERVES AS AN ATTACHMENT 


FT 


DOMAIN 


1211 


1267 


LAMININ EGF-LIKE 7. 


CC 


SUBSTRATE FOR CELLS. 


FT 


DOMAIN 


1277 


1326 


LAMININ EGF-LIKE 8. 


CC 


-!• SUBUNIT: PURIFIED PERLECAN HAS A STRONG TENDENCY TO AGGREGATE IN 


FT 


DOMAIN 


1327 


1336 


LAMININ EGF-LIKE 9 (N-TERMINAL) . 


CC 


DIMERS OR STELLATE STRUCTURES, 


FT 


DOMAIN 


1337 


1531 


LAMININ DOMAIN IV 3 (DOMAIN III C) . 


CC 


•!• SUBCELLULAR LOCATION: EXTRACELLULAR, 


FT 


DOMAIN 


1532 


1564 


LAMININ EGF-LIKE 9 (C-TERMINAL). 


CC 


-!* TISSUE SPECIFICITY: FOUND IN THE BASEMENT MEMBRANES, 


FT 


DOMAIN 


1565 


1614 


LAMININ EGF-LIKE 10. 


CC 


-!- PTM: CONTAINS THREE HEPARAN SULFATE CHAINS AS WELL AS N-LINKED 


FT 


DOMAIN 


1615 


1672 


LAMININ EGF-LIKE 11. 


CC 


AND O-LINKED OLIGOSACCHARIDES, 


FT 


DOMAIN 


1679 


1773 


IG-LIKE C2-TYPE DOMAIN 2, 


CC 


-!- SIMILARITY: CONTAINS 4 LDL- RECEPTOR CLASS A DOMAINS, 


FT 


DOMAIN 


1774 


1867 


IG-LIKE C2-TYPE DOMAIN 3, 


CC 


-!■ SIMILARITY: CONTAINS 10.5 LAMININ EGF-LIKE DOMAINS. 


FT 


DOMAIN 


1868 


1957 


IG-LIKE C2-TYPE DOMAIN 4, 


CC 


-!• SIMILARITY: CONTAINS 3 LAMININ DOMAINS IV. 


FT 


DOMAIN 


1958 


2053 


IG-LIKE C2-TYPE DOMAIN 5, 


CC 


-!- SIMILARITY: BELONGS TO THE IMMUNOGLOBULIN SUPERFAMILY, CONTAINS 


FT 


DOMAIN 


2054 


2153 


IG-LIKE C2-TYPE DOMAIN 6. 


CC 


22 C2-LIKE DOMAINS. 


FT 


DOMAIN 


2154 


2246 


IG-LIKE C2-TYPE DOMAIN 7. 


CC 


-!- SIMILARITY: CONTAINS 3 LAMININ G-LIKE DOMAINS. 


FT 


DOMAIN 


2247 


2342 


IG-LIKE C2-TYPE DOMAIN 8. 


CC 


-!• SIMILARITY: CONTAINS 4 EGF-LIKE DOMAINS. 


FT 


DOMAIN 


2343 


2438 


IG-LIKE C2-TYPE DOMAIN 9. 


n 




FT' 


DOMAIN 


2439 


2535 


IG-LIKE C2-TYPE DOMAIN 10, 


■ 


This SWISS-PROT entry is copyright, it is produced through a collaboration 


FT 


DOMAIN 


2536 


2631 


IG-LIKE C2-TYPE DOMAIN 11. 


w 


between the Swiss Institute of Bioinformatics and the EMBL outstation • 


FT 


DOMAIN 


2632 


2728 


IG-LIKE C2-TYPE DOMAIN 12. 


CC 


the European Bioinformatics Institute. There are no restrictions on its 


FT 


DOMAIN 




2828 


IG-LIKE C2-TYPE DOMAIN 13. 


CC 


use by non-profit institutions as long as its content is in no way 


FT 


UUMA1N 


lino 




Tr.TTVP PT.TVDP HAMSTXT 1/1 

lb L1KL lz lltx UUMAIN 14. 


CC 


modified and this statement is not removed, Usage by and for commercial 


FT 


DOMAIN 


2927 


3023 


IG-LIKE C2-TYPE DOMAIN 15. 


CC 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


FT 


DOMAIN 


3024 


3114 


IG-LIKE ' C2-TYPE DOMAIN 16. 


CC 


or send an email to licenseGisb-sib.ch), 


FT 


DOMAIN 


3115 


3213 


IG-LIKE C2-TYPE DOMAIN 17. 


CC 




FT 


UUMAiW 




3300 


lb LIKfc Lt lirh DUMAIN IB. 


DR 


EMBL; X62515; G29470; -. 


FT 


DOMAIN 


. 3301 


3401 


IG-LIKE C2-TYPE DOMAIN 19. 


DR 


EMBL; M85289; G184427; -. 


FT 


DOMAIN 


3402 


3490 


IG-LIKE C2-TYPE DOMAIN 20. 


DR 


EMBL; M64283; G184425; •. 


FT 


DOMAIN 


3491 


3576 


IG-LIKE C2-TYPE DOMAIN 21, 


DR 


EMBL; S76436; G243371; -. 


FT 


DOMAIN 


3577 


3671 


IG-LIKE C2-TYPE DOMAIN 22, 


DR 


EMBL; L22078; -; NOT ANNOTATED CDS. 


FT 


DOMAIN 


3701 


3847 


LAMININ G-LIKE 1 (GLOBULAR DOMAIN V A) . 


DR 


MIM; 142461; -. 


FT 


DOMAIN 


3846 


3883 


EGF-LIKE 1. 


DR 


PROSITE; PS00022; EGF.l; 9. 


FT 


DOMAIN 


3886 


3924 


EGF-LIKE 2. 


DR 


PROSITE; PS01186; EGF.2; 5. 


FT 


DOMAIN 


3966 


4104 


LAMININ G-LIKE 2 (GLOBULAR DOMAIN V B) . 


DR 


PROSITE; PS01209; LDLRAJ; 4. 


FT 


DOMAIN 


4106 


4143 


EGF-LIKE 3. 


DR 


PROSITE; PS01248; LAMININ TYPE EGF; 11. 


FT 


DOMAIN 


4145 


4178 


EGF-LIKE 4. 


DR 


PROSITE; PS50068; LDLRA 2; 4. 


FT 


DOMAIN 


4243 


4391 


LAMININ G-LIKE 3 (GLOBULAR DOMAIN V C) . 


DR 


PFAM; PF00008; EGF; 4. 


FT 


SITE 


65 


67 


HEPARAN SULFATE (POTENTIAL), 


DR 


PFAM; PF00047; ig; 22. 


FT 


SITE 


71 


73 


HEPARAN SULFATE (POTENTIAL). 


DR 


PFAM; PF00052; lamininJB; 3, 


FT 


SITE 


76 


78 


HEPARAN SULFATE (POTENTIAL). 


DR 


PFAM; PF00053; lamininJGF; 8. 


FT 


SITE 


4151 


4153 


MEDIATES MOTOR NEURON ATTACHMENT 
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FT (POTENTIAL). 

FT SITE 4301 4303 MEDIATES MOTOR NEURON ATTACHMENT 

FT (POTENTIAL). 

FT DISULFID 199 212 BY SIMILARITY. 

FT DISULFID 206 225 BY SIMILARITY. 

FT DISULFID 219 234 BY SIMILARITY, 

FT DISULFID 285 297 BY SIMILARITY, 

FT DISULFID 292 310 BY SIMILARITY. 

FT DISULFID 304 319 BY SIMILARITY. 

FT DISULFID 325 337 BY SIMILARITY. 

FT DISULFID 332 350 BY SIMILARITY, 

FT DISULFID 344 359 BY SIMILARITY. 

FT DISULFID 368 381 BY SIMILARITY. 

FT DISULFID 375 394 BY SIMILARITY. 

FT DISULFID 388 403 BY SIMILARITY. 

FT DISULFID 766 775 BY SIMILARITY. 

JT DISULFID 768 782 BY SIMILARITY. 

A DISULFID 785 794 BY SIMILARITY. 

H DISULFID 797 813 BY SIMILARITY. 

T 1 ! DISULFID 816 831 BY SIMILARITY. 

FT DISULFID 818 841 BY SIMILARITY. 

FT DISULFID 844 853 BY SIMILARITY, 

FT DISULFID 856 871 BY SIMILARITY, 

FT DISULFID 1161 1170 BY SIMILARITY, 

FT DISULFID 1163 1177 BY SIMILARITY. 

FT DISULFID 1180 1189 BY SIMILARITY. 

FT DISULFID 1192 1208 BY SIMILARITY. 

Note: remainder of annotations omitted. 

Query Match 13.5%; Score 118; DB 1; Length 4393; 

Best Local Similarity 45.2%; Pred. No. 1.37e-07; 

Matches 14; Conservative 10; Mismatches 4; Indels 3; 

Db 3855 CQNGGQCHDSESS-SYVCVCPAGFTGSRCEH 3884 

|::| III I: : :| |:| :||:| :||: 
Qy 4 CHHG-QCHISDRGEPY-CLCQPGFSGHHCEQ 32 



RESULT 14 

ID LU2_CAEEL STANDARD; PRT; 1429 AA. 

AC P14585; 

DT 01-JAN-1990 (REL. 13, CREATED) 

DT 01-JAN-1990 (REL. 13, LAST SEQUENCE UPDATE) 

DT 01-OCM996 (REL, 34, LAST ANNOTATION UPDATE) 

A LIN- 12 PROTEIN PRECURSOR. 

M LIN-12 OR R107.8. 

CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=BRISTOL N2; 

RX MEDLINE; 88334747. 

RA YOCHEM J., WESTON K., GREENWALD I,; 

RT "The Caenorhabditis elegans lin-12 gene encodes a transmembrane 

RT protein with overall similarity to Drosophila Notch."; 

RL NATURE 335:547-550(1988). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M., 

RA BONFIELD J,, BURTON J., CONNELL M, , COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M. , DEAR S., DU Z., DURBIN R. , FAVELLO A., FRASER A., 

RA FULTON L, GARDNER A., GREEN P., HAWKINS T,, HILLIER L., JIER M, , 

RA JOHNSTON L,, JONES M., KERSHAW J.', KIRSTEN J., LAISSTER N., 

RA LATREILLE P., LIGHTNING J., LLOYD C, MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R. 

RA SIMS M., SMALDON N. , SMITH A., SMITH M., SONNHAMMER E., STADEN R., 

RA SULSTON J,, THIERRY-MI EG J., THOMAS K., VAUDIN M. , VAUGHAN K. , 

RA WATERSON R. r WATSON A., WEINSTOCK L. ( WILKINSON-SPROAT J., 

RA WOHLDMAN P.; 



RT 
RT 


"2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 


RL 


NATURE 3 


8:32-38(1994), 




CC 


-!- FUNCTION: LIN-12 IS IS INVOLVED IN SEVERAL CELL FATES DECISIONS 


CO 


THAT REQUIRES CELL-CELL INTERACTIONS. IT IS POSSIBLE THAT LIN-12 


CC 


ENCODES A ME 


MBRANE-BO 


JND RECEPTOR FOR A SIGNAL THAT ENABLES 


CC 


EXPRESSION 0 


F THE VENTRAL UTERINE PRECURSOR CELL FATE. 


CC 


-!- SUBCELLULAR LOCATION: 


TYPE I MEMBRANE PROTEIN, 


CC 


-!- SIMILARITY: HIGH, TO C. ELEGANS GLP-1. 


CC 


•!- SIMILARITY: CONTAINS 13 EGF-LIKE DOMAINS. 


CC 


-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 


CC 
CC 


-!- SIMILARITY: CONTAINS 


ANK REPEATS. 


CC 


This SWISS -PRO! 


entry is copyright. It is produced through a collaboration 


CC 


between 


the Swi 


ss Institute of Bioinformatics and the EMBL outstation - 


CC 


the European Bioinformatics Institute. There are no restrictions on its 


CC 


use by 


non-profit institutions as long as its content is in no way 


CC 


modified 


and this statement is not removed. Usage by and for commercial 


CC 


entities requires a licen 


e agreement (See http://www.isb-sib.ch/announce/ 


CC 
CC 


or send e 


n email to license@isb-sib.ch) . 


DR 


EMBL; M12069; G156358; -. 




DR 


EMBL; Z14092; E1348691; ■ 




DR 


PIR; S06434; S06434. 




DR 


WORMPEP; R107.8; 


CE00274. 




DR 


PROSITE; PS00010 


ASXJYDROXYL; 3. 


DR 


PROSITE; PS00022; EGF 1; 12. 


DR 


PROSITE; 


PS01186 


EGF.2; 11. 


DR 


PROSITE; PS01187; EGF.CA; 


2. 


DR 


PFAM; PF 


0008; EGF; 13. 




DR 


PFAM; PF00023; ank; 4. 




DR 


PFAM; PF00066; notch; 3. 




DR 


HSSP; P00740; 1IXA. 




KW 


DIFFERENTIATION; REPEAT; ANK REPEAT; EGF "LIKE DOMAIN; TRANSMEMBRANE; 


KW 


GLYCOPROTEIN; SIGNAL. 




FT 


SIGNAL 


1 


15 


POTENTIAL. 


FT 


CHAIN 


16 


1429 


LIN-12 PROTEIN. 


FT 


DOMAIN 


16 


908 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


909 


931 


POTENTIAL. 


FT 


DOMAIN 


932 


1429 


CYTOPLASMIC (POTENTIAL) . 


FT 


DOMAIN 


24 


618 


13 X EGF -TYPE REPEATS. 


FT 


DOMAIN 


631 


750 


3 X LIN/NOTCH REPEATS. 


FT 


DOMAIN 


1046 


1266 


6 X ANK MOTIF REPEATS. 


FT 


DOMAIN 


20 


61 


EGF-LIKE 1. 


FT 


DOMAIN 


114 


150 


EGF-LIKE 2. 


FT 


DOMAIN 


152 


190 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


201 


246 


EGF-LIKE 4. 


FT 


DOMAIN 


250 


285 


EGF-LIKE 5. 


FT 


DOMAIN 


287 


323 


EGF-LIKE 6. 


FT 


DOMAIN 


323 


363 


EGF-LIKE 7. 


FT 


DOMAIN 


365 


402 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


404 


441 


EGF-LIKE 9. 


FT 


DOMAIN 


449 


492 


EGF-LIKE 10. 


FT 


DOMAIN 


503 


541 


EGF-LIKE 11. 


FT 


DOMAIN 


543 


579 


EGF-LIKE 12. 


FT 


DOMAIN 


582 


619 


EGF-LIKE 13. 


FT 


REPEAT 


635 


669 


LIN/NOTCH 1. 


FT 


REPEAT 


670 


710 


LIN/NOTCH 2. 


FT 


REPEAT 


711 


750 


LIN/NOTCH 3. 


FT 


REPEAT 


1046 


1078 


ANK MOTIF 1. 


FT 


REPEAT 


1079 


1119 


ANK MOTIF 2. 


FT 


REPEAT 


1120 


1152 


ANK MOTIF 3. 


FT 


REPEAT 


1153 


1188 


ANK MOTIF 4 , 


FT 


REPEAT 


1189 


1232 


ANK MOTIF 5. 


FT 


REPEAT 


1233 


1266 


ANK MOTIF 6, 


FT 


DISULFID 


24 


35 


BY SIMILARITY, 


FT 


DISULFID 


29 


49 


BY SIMILARITY. 


FT 


DISULFID 


51 


60 


BY SIMILARITY. 


FT 


DISULFID 


118 


129 


BY SIMILARITY, 


FT 


DISULFID 


123 


138 


BY SIMILARITY. 


FT 


DISULFID 


140 


149 


BY SIMILARITY, 


FT 


DISULFID 


156 


169 


BY SIMILARITY, 


FT 


DISULFID 


163 


178 


BY SIMILARITY, 
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180 


189 


BY SIMILARITY. 


RP 


205 


227 


BY SIMILARITY. 


RC 


221 


234 


BY SIMILARITY. 


RA 


236 


245 


BY SIMILARITY. 


RL 


254 


264 


BY SIMILARITY. 


RN 


259 


273 


BY SIMILARITY. 


RP 


275 


284 


BY SIMILARITY, 


RA 


291 


302 


BY SIMILARITY. 


RL 


296 


311 


BY SIMILARITY. 


CC 


313 


322 


BY SIMILARITY. 


cc 


327 


339 


BY SIMILARITY. 


CC 


334 


351 


BY SIMILARITY. 


cc 


353 


362 


BY SIMILARITY. 


cc 


369 


381 


BY SIMILARITY. 


cc 


375 


390 


BY SIMILARITY. 


cc 


392 


401 


BY SIMILARITY. 


cc 


408 


419 


BY SIMILARITY. 


cc 


413 


429 


BY SIMILARITY, 


cc 


431 


440 


BY SIMILARITY, 


cc 


507 


518 


BY SIMILARITY. 


cc 


512 


529 


BY SIMILARITY. 


cc 


531 


540 


BY SIMILARITY. 


cc 


547 


558 


BY SIMILARITY, 


cc 


552 


567 


BY SIMILARITY, 


cc 


569 


578 


BY SIMILARITY. 


DR 


586 


597 


BY SIMILARITY. 


DR 


591 


607 


BY SIMILARITY, 


DR 


609 


618 


BY SIMILARITY, 


DR 


41 


41 


POTENTIAL. 


DR 


165 


165 


POTENTIAL. 


DR 


194 


194 


POTENTIAL. 


DR 


378 


378 


POTENTIAL. 


DR 


515 


515 


POTENTIAL. 


DR' 


623 


623 


POTENTIAL. 


DR 


751 


751 


POTENTIAL. 


DR 


754 


754 


POTENTIAL, 


DR 


900 


900 


POTENTIAL. 


DR 


1429 AA; 157115 MW; CFD2CCA4 CRC32; 


KW 



FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

•DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

SQ SEQUENCE 



Query Match 13.2%; Score 115; DB 1; Length 1429; 

Best Local Similarity 43.6*; Pred. No, 5,24e-07; 

'Matches 17; Conservative 9; Mismatches 10; Indels 3; Gaps 3; 



Db 334 C NHGTCI DSPLSE KAFECQCEPGYEG I LCEQCKNECLSE 372 

1:11 I I :| :: I hlh I III: I |::| 
Qy ' 4 CHHGQCH ISDRGE ■ PY ■ CLCQPGFSGHHCEQE • NPCMGE 39 



f 

DT 
DT 
DE 



1SOLT 15 

CYR6JUMAN STANDARD; PRT; 381 AA. 
000622; 014934; 

15-JUL-1998 (REL. 36, CREATED) 
15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 
15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 
CYR61 PROTEIN PRECURSOR (GIG1 PROTEIN) (INSULIN-LIKE GROWTH FACTOR- 
BINDING PROTEIN 10). 
IGFBP10 OR CYR61 OR GIG1. 
HOMO SAPIENS (HUMAN). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

[1] 

SEQUENCE FROM N.A. 

ALBRECHT C, VON DER KAMMER H., KLAUDINY J., MAYHAUS M., NITSCH R.M.; 

SUBMITTED (JUN- 1997) TO EMBL/G ENBANK/DDB J DATA BANKS. 

[2] 

SEQUENCE FROM N.A. 
MEDLINE; 97280750. 

JAY P., BERGE-LEFRANC J.L., MARSOLLIER C, MEJEAN C, TAVIAUX S,, 
BERTA P.; 

"The human growth factor-inducible immediate early gene, CYR61, maps 
to chromosome lp.\ 
ONCOGENE 14:1753-1757(1997). 
[3] 



SEQUENCE FROM N.A. 
TISSUE-PLACENTA; 
KOLESNIKOVA T.V., LAC L.F.; 

SUBMITTED (JDN-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 
[4] 

SEQUENCE FROM N.A. 
BI A.B., YU L.; 

SUBMITTED (NOV-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

•!• FUNCTION: MAY ACT AS ONE OF THE MANY GROWTH FACTOR-BINDING 

PROTEINS; PROMOTES PROLIFERATION, MIGRATION AND ADHESION (BY 

SIMILARITY), 

-!- SIMILARITY: BELONGS TO THE INSULIN-LIKE GROWTH FACTOR BINDING 

PROTEIN FAMILY. CEF-10/CYR61/CTFG/FISP-12/NOV PROTEIN SUBFAMILY. 
-I- SIMILARITY: CONTAINS 1 VWFC DOMAIN. 

-I- SIMILARITY: CONTAINS 1 C-TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute, There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 

EMBL; Y12084; E311857; -. 
EMBL; U62015; G2130527; -. 
EMBL; AF003594; G2196782; -. 
EMBL; AF031385; G2606094; -, 
MIM; 602369; -. 

PROSITE; PS00222; IGFJINDING; 1. 
PROSITE; PS01185; CTCK 1; 1. 
PROSITE; PS01225; CTCK.2; 1, ' 
PROSITE; PS01208; VWFC; 1, 
PFAM; PF00007; Cysjcnotj.l. 
PFAM; PF00090; tsp.l; 1, 
PFAM; PF00093; vwc; 1. 
PFAM; PF00219; IGFBP; 1. 
GROWTH FACTOR BINDING; SIGNAL. 



Matches 20; Conservative 10; Mismatches 29; Indels 3; Gaps 3; 

Db 300 YAGCLSVKKYRPKYC-GSCVDGRCCTPQLTRTVKMRFRCEDGETFSKNVMMIQSCKCNYN 358 

11:1 : I : I 1:1 II I :: I |:| II :| :| II 
Qy 51 YASCATASKVPIMECRGGC-GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCR-A 108 

Db 359 CP 360 

I: 

Qy 109 CS 110 



[ SIGNAL 


1 


24 


POTENTIAL. 


t CHAIN 


25 


381 


CYR61 PROTEIN, 


r DOMAIN 


98 


164 


VWFC. 


r DOMAIN 


286 


360 


CTCK. 


r DISULFID 


286 


323 


BY SIMILARITY. 


r DISULFID 


303 


337 


BY SIMILARITY. 


r DISULFID 


314 


353 


BY SIMILARITY, 


P DISULFID 


317 


355 


BY SIMILARITY. 


r DISULFID 


322 


359 


BY SIMILARITY. 


r CONFLICT 


210 


210 


L -> I (IN REF, 4). 


r CONFLICT 


220 


220 


L ■> R (IN REF. 4). 


1 SEQUENCE 


381 AA; 


42026 MW; 2B091D9E CRC32; 


Query Match 




13.1%; 


Score 114; DB 1; L 



Search completed: Fri May 28 09:25:47 1999 
Job time : 19 sees. 
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******************** 



Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

'rch_pp protein - protein database search, using Smith-Waterman algorithm 

,n on: Fri May 28 09:26:05 1999; MasPar time 10.25 Seconds 

585.992 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 



>US-09-191-647-ll 

(1-110) from US09191647 .pep 

873 

1 AFKCHHGQCHISDRGEPYCL GSSFVEEVERHLECGCRACS 110 

PAM 150 
Gap 11 

179066 seqs, 54579741 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 



sptrembl9 

l:sp„archea 2:sp_bacteria 3:sp.fungi 4:sp_human 
5:spJ.nvertebrate 6:spjtanunal 7:sp_mhc 8:sp_organelle 
9:sp_phage 10:sp_plant ll:sp_rodent 12 : sp_unclassif led 
13 : sp_vertebrate 14 : sp_virus 

istics: Mean 37.119; Variance 57.890; scale 0.641 ■ 

Pred. No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



NO. 


Score 


Match Length 


DB 


ID 


Description 


Pred, No, 


1 


836 


95.8 


1523 


11 


088280 


MEGF5. 


3.63e-188 


2 


798 


91.4 


739 


4 


075094 


MEGF5 (FRAGMENT). 


4.81e-178 


3 


442 


50.6 


1531 


11 


088279 


MEGF4 . 


9,62e-85 


4 


310 


35.5 


79 


4 


075093 


MEGF4 (FRAGMENT); 


1.76e-51 


5 


185 


21,2 


601 


5 


Q20204 


F40E10.4 PROTEIN (FRAG 


9.57e-22 


6 


128 


14.7 


1722 


5 


Q19350 


SIMILAR TO EGF-LIKE RE 


1.79e-09 


7 


124 


14.2 


2447 


13 


013149 


NOTCH 2 (FRAGMENT) . 


1.14e-08 


8 


123 


14.1 


434 


11 


055139 


JAGGED2 PROTEIN (FRAGM 


1.81e-08 


9 


123 


14,1 


518 


11 


070219 


JAGGED 2 (JAGGED 2 PRO 


1.81e-08 


10 


123 


14.1 


1202 


11 


P97607 


JAGGED2 (FRAGMENT). 


1.81e-08 


11 


121 


13.9. 


955 


4 


Q99466 


N0TCH4 (FRAGMENT) . 


4,52e-08 


12 


121 


13.9 


1687 


11 


Q61204 


NOTCH2-LIKE (EGF REPEA 


4,52e-08 


13 


121 


13.9 


1964 


11 


035442 


N0TCH4, 


4,52e-08 


14 


121 


13.9 


1999 


4 


Q99940 


N0TCH4. 


4.52e-08 


15 


121 


13.9 


2003 


4 


000306 


N0TCH4 . 


4.52e-08 


16 


121 


13.9 


2470 


11 


035516 


CELL SURFACE PROTEIN. 


4.52e-08 


17 


118 


13.5 


169 


4 


014944 


EPIREGULIN, 


1.76e-07 


18 


117 


13.4 


530 


5 


Q24526 


SLIT LOCOS ENCODING A 


2.77e-07 


19 


117 


13.4 


2653 


5 


025253 


NOTCH HOMOLOG SCALLOPE 


2.77e-07 


20 


116 


13.3 


762 


13 


042373 


NOTCH RECEPTOR PROTEIN 


4.33e-07 



21 


115 


13.2 


372 5 


Q21756 


HYPOTHETICAL 39.1 KD P 


6.78e-07 


22 


115 


13.2 


1203 11 


Q06008 


NOTCH PROTEIN HOMOLOG 


6.78e-07 


23 


115 


13.2 


2352 5 


061240 


HRNOTCH PROTEIN, 


6.78e-07 


24 


114 


13.1 


381 4 


043775 


CYR61 PROTEIN. 


1.06e-06 


25 


114 


13.1 


728 13 


Q90656 


TRANSMEMBRANE PROTEIN 


1.06e-06 


26 


112 


12.8 


367 11 


054775 


ELM1. 


2.57e-06 


27 


111 


12.7 


752 13 


042374 


NOTCH RECEPTOR PROTEIN 


3.99e-06 


28 


110 


12.6 


717 13 


P87357 


DELTAD TRANSMEMBRANE P 


6.19e-06 


29 


109 


12.5 


162 11 


Q61521 


EPIREGULIN, 


9.59e-06 


30 


109 


12.5 


1212 13 


042347 


C- SERRATE -2 (FRAGMENT) 


9.59e-06 


31 


109 


12.5 


2180 5 


001768 


SIMILARITY TO EGF-LIKE 


9.59e-06 


32 


108 


12.4 


308 6 


046370 


PREADIPOCYTE FACTOR-1, 


1.48e-05 


33 


108 


12.4 


406 5 


025059 


FIBROPELLIN III (FRAGM 


1.48e-05 


34 


108 


12.4 


1095 4 


Q99458 


NOTCH4 (FRAGMENT). 


1.48e-05 


35 


108 


12.4 


1476 13 


Q90285 


PUTATIVE EXTRACELLULAR 


1.48e-05 


36 


107 


12.3 


383 11 


070534 


ZOG. 


2.29e-05 


37 


107 


12.3 


383 11 


Q62779 


PREADIPOCYTE FACTOR 1. 


2.29e-05 


38 


107 


12,3 


502 5 


017692 


MEC-9 PROTEIN. 


2,29e-05 


39 


107 


12,3 


661 5 


061537 


SPERM TRANSMEMBRANE PR 


2,29e-05 


40 


107 


12,3 


838 5 


Q18761 


MEC-9 PROTEIN. 


2,29e-05 


41 


107 


12.3 


838 5 


027422 


MEC-9L. 


2.29e-05 


42 


106 


12.1 


293 5 


Q20979 


F58E6.3 PROTEIN, 


3.52e-05 


43 


106 


12.1 


387 11 


Q06007 


NOTCH PROTEIN HOMOLOG 


3.52e-05 


44 


106 


12.1 


802 13 


057462 


DELTAA, 


3.52e-05 


45 


106 


12.1 


3857 11 


088840 


MUTANT FIBRILLIN-1. 


3.52e-05 



RESULT 
ID 



PRELIMINARY; PRT; 1523 AA, 



, CREATED) 

, LAST SEQUENCE UPDATE) 
, LAST ANNOTATION UPDATE) 



088280 
088280; 

01-NOV-1998 (TREMBLREL. ( 
01-NOV-1998 (TREMBLREL. ( 
01-NOV-1998 (TREMBLREL. ( 
MEGF5, 
MEGF5, 

RATTUS N0RVEGICUS (RAT). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

[1] 

SEQUENCE FROM N.A. 

STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA O.; 

"Identification of high-molecular -weight proteins with multiple 

EGF-like motifs by motif -trap screening."; 

GENOMICS 51:27-34(1998). 

EMBL; AB011531; D1033424; -. 

PROSITE; PS01185; CTCKJ; 1. 

PROSITE; PS01186; EGF 2; 7. 

PROSITE; PS01187; EGF.CA; 2, 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1523 AA; 167767 MW; 2BD845D0 CRC32; 



Query Match 95.8%; Score 836; DB 11; Length 1523; 

Best Local Similarity 94.5%; Pred. No. 3.63e-188; 

Matches 104; Conservative 3; Mismatches 3; Indels 0; Gaps 0; 

Db 1414 AFKCHHGQCHISDRGEPYCLCQPGFSGNHCEQENPCLGEIVRBAIRRQKDYASCATASKV 1473 

lllllllllllllllllllllllllll:|||llll|:||||||||||||||||||||||| 
Qy 1 AFKCHHGQCHISDRGEPYCLCQPGFSGHHCEQENPCMGEIVREAIRRQKDYASCATASKV 60 

Db 1474 PIMVCRGGCGSQCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCRECS 1523 

III HUM: IIIIIIIIIMIIIIIIIIIIIIIIIIIIIIini II 
Qy 61 PIMECRGGCGTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCRACS 110 



RESULT 2 

ID 075094 PRELIMINARY; PRT; 739 AA. 
AC 075094; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 
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DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
DE MEGF5 (FRAGMENT). 
GN MEGF5. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 
OC CATARRHINI; HOMINIDAE; HOMO. 
RN HI 

RP SEQUENCE FROM N.A. 

RC TISSUE=BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D. , NAGASE T. , NOMURA N., SEKI N,, OHARA 0,; 

RT "Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif -trap screening,"; 

RL GENOMICS 51:27-34(1998) . 

DR EMBL; AB011538; D1033429; -. 

DR PROSITE; PS01185; CTCKJ; 1. 

DR PROSITE; PS01186; EGF_2; 7. 

DR PROSITE; PS01187; EGF_CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

FT NONJTER 1 1 

t SEQUENCE 739 AA; 80364 MW; DC6BCB63 CRC32; 
ery Match 91.4%; Score 798; DB 4; Length 739; 

Best Local Similarity 89.1%; Pred. No. 4.81e-178; 
Matches 98; Conservative 5; Mismatches 7; Indels 0; Gaps 0; 

Db 630 AFKCHHGQCHISDQGEPYCLCQPGFSGEHCQQENPCLGQWREVIRRQKGYASCATASKV 689 

MIMMIIMMMMMIIMIMI Mill llilllllll 

Qy 1 AFKCHHGQCH I SDRGE P YC LCQPG FSGHHCEQENPCMGE I VREAIRRQKDYASCAT ASKV 60 

Db 690 PIMECRGGCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS 739 

llilllllll Mil IIIMIIIIIIIIIIIIIIIIIIIIIIII III 
Qy 61 P IMECRGGCGTTCCQP IRS KRRKYVFQCTDGSSFVEEVERHLECGCRACS 110 



RESULT 3 

ID 088279 PRELIMINARY; PRT; 1531 AA. 
AC 088279; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF4. 

GN MEGF4. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 
OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 
RN [1] 

RP SEQUENCE FROM N.A, 

tSTRAIN=SPRAGUE-DAWLEY; TISSUE=BRAIN; 
MEDLINE; 98360089. 
NAKAYAMA M., NAKAJIMA D., NAGASE T. , NOMURA N,, SEKI N., OHARA O,; 

RT "Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif -trap screening,"; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011530; D1033423; -. 

DR PROSITE; PS01185; CTCKJ; 1, 

DR PROSITE; PS01186; EGF_2 ; 8. 

DR PROSITE; PS01187; EGF.CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

Query Match 50.6*; Score 442; DB 11; Length 1531; 

Best Local Similarity 50.0%; Pred. No. 9.62e-85; 

Matches 56; Conservative 22; Mismatches 31; Indels 3; Gaps 3; 

Db 1421 GLQCLHGHCQASATRG-AHCVCSPGFSGELCEQESECRGDPVRDFHRVQRGYAICQTTRP 1479 

I 11:1: I II : Ml Mill MM: I I: IM I |: II I I: 
Qy 1 AFKCHHGQCHISD-RGEPYCLCQPGFSGHHCEQENPCMGEIVREAIRRQKDYASCATASK 59 

Db 1480 LSWVECRGACPGQGCCQGLRLKRRKLTFECSDGTSFAEEVEKPTKCGCAPCA 1531 

MIMM I Ml M MM 1 : 1 : 1 1 : 1 1 III: Ml :|: 
Qy 60 VPIMECRGGC-GTTCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCRACS 110 



RESULT 4 

ID 075093 PRELIMINARY; PRT; 79 AA. 

AC 075093; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE MEGF4 (FRAGMENT). 

GN MEGF4 , 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N,, SEKI R., OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif -trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011537; D1033428; -. 

DR PROSITE; PS01185; CTCKJ.; 1. 

FT NONJTER 1 1 

SQ SEQUENCE 79 AA; 8809 MW; 96C95FFE CRC32; 

Query Match 35,5*; Score 310; DB 4; Length 79; 

Best Local Similarity 48,1*; Pred. No. 1.76e-51; 

Matches 38; Conservative 17; Mismatches 23; Indels 1; Gaps 1; 

Db 1 ESECRGDPVRDFHQVQRGYAICQTTRPLSWVECRGSCPGQGCCQGLRLKRRKFTFECSDG 60 

h I \- IM : I: II I |: :: :||l|: | Ml M III: I : I : f I 
Qy 33 ENPCMGEIVREAIRRQKDYASCATASKVPIMECRGGC-GTTCCQPIRSKRRKYVFQCTDG 91 

Db 61 TSFAEEVEKPTKCGCALCA 79 

Ml MM: Ml |: 
Qy 92 SSFVEEVERHLECGCRACS 110 



RESULT 5 

ID Q20204 PRELIMINARY; PRT; 601 AA. 

AC Q20204; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST ANNOTATION UPDATE) 

DE F40E10.4 PROTEIN (FRAGMENT) , 

GN F40E10.4, 

OS CAENORHABDITIS ELEGANS . 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN HI 

RP SEQUENCE FROM N.A. 

RA SMYER.; 

RL SUBMITTED (FEB-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [2] 

RP SEQUENCE FROM N.A. 

RX. MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K, , BAYNES C, BERKS M, , 

RA BONFIELD J., BURTON J,, CONNELL M., COPSEY T,, COOPER J., COULSON A., 

RA CRAXTON M, , DEAR S., DU Z., DURBIN R., FAVELLO A., FULTON L. , 

RA GARDNER A,, GREEN P., HAWKINS T., HILLIER L,, JIER M, , JOHNSTON L., 

RA JONES M, , KERSHAW J., KIRSTEN J., LAISTER N., LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE 8., O'CALLAGHAN M. , 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D.', SHOWNKEEN R., 

RA SMALDON N, , SMITH A., SONNHAMMER E., STADEN R,, SULSTON J,, 

RA THIERRY -MIEG J., THOMAS K., VAUDIN M., VAUGHAN K., WATERSTON R. , 

RA WATSON A., WEINSTOCK L., WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT ' elegans."; 

RL NATURE 368:32-38(1994). 

DR EMBL; Z69792; E1346459; -. 

DR PROSITE; PS01187; EGF.CA; 1. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

FT NONJER 1 1 
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SQ SEQUENCE 601 AA; 66669 MW; F8A72773 CRC32; 



Query Match 21.2*; 
Best Local Similarity 28.24; 
Matches 31; Conservative 



Score 185; DB 5; Length 601; 

Pred. No. 9,57e-22; 

26; Mismatches 46; Indels 7; 



5; 



Db 



485 G IDCGNGKCTNNALS PKG YMCQCDSHFSGEHCD - EKRI KCD KQKFRRHHIENECRSVD 541 
:: I :| I : III:: III ||: |: : : :: ;||: | : 
Qy 1 AFKCHHGQCHISD-RGEPY-CLCQPGFSGHHCEQENPCMGEIVREAIRRQKDYASCAIAS 58 

Db 542 RIKIAECNGYCGGEQNCCTAVKKKQRKVKMICKNGTTKISTVHIIRQCQC 591 

I II I II II ::: |:|| I :|:: : I :| | 
Qy 59 KVPIMECRGGCGT--TCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGC 106 



RESULT 6 

tQ19350 PRELIMINARY; PRT; 1722 AA. 
Q19350; 
01-NOV-1996 ( TREMBLREL . 01, CREATED) 
01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE SIMILAR TO EGF-LIKE REPEATS, NCBI 61: 1125776. 

GN F11C7.4. 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R. ( AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M, , 

RA BONFIELD J., BURTON J., CONNELL M, , COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M. , DEAR S., DU Z., DURBIN R, , FAVELLO A., FULTON L, 

RA GARDNER A,, GREEN P., HAWKINS T,, HILLIER L., JIER M., JOHNSTON L., 

RA JONES M, , KERSHAW J., KIRSTEN J., LAISTER N., LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A,, MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R., 

RA SMALDON N., SMITH A., SONNHAMMER E., STADEN R., SULSTON J., 

RA THIERRY -MIEG J., THOMAS K., VAUDIN M. , VAUGHAN K., WATERSTON R. ( 

RA WATSON A., WEINSTOCK L. , WILKINSON -SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C, 

RT elegans . " ; 

RL NATURE 368:32-38(1994). 

RN [2] 

RP SEQUENCE FROM N.A. 

A' STRAIN-BRISTOL N2; 

H TAICH A., VETTER J.; 

W SUBMITTED (JAN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U42839; G1125776; 

DR PROSITE; PS00010; ASXJYDROXYL; 5. 

DR PROSITE; PS01186; EGFJ; 19, 

DR PROSITE; PS01187; EGF_CA; 3. 

DR PFAM; PF00008; EGF; 24. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1722 AA; 188383 MW; CCFB86B8 CRC32; 

Query Match 14.7%; Score 128; DB 5; Length 1722; 

Best Local Similarity 38.9*; Pred. No. 1.79e-09; 

Matches 14; Conservative 8; Mismatches 13; Indels 1; Gaps 1; 



Db 372 IRCLNGGSCKLDAEGEPFCVCEEGFDGPFCEPKSGC 407 

::| :| I I f I : I : I : III II : I 
Qy 2 FKCHHG-QCHISDRGEPYCLCQPGFSGHHCEQENPC 36 



RESULT 7 

ID 013149 PRELIMINARY; PRT; 2447 AA. 

AC 013149; 

DT 01-JUL-1997 (TREMBLREL, 04, CREATED) 

DT 01-JUL-1997 (TREMBLREL, 04, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH 2 (FRAGMENT). 

OS FUGU RUBRIPES (JAPANESE PUFFERFISH). 



OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; ACANTHOPTERYGII; PERCOMORPHA; 

OC TETRAODONTIFORMES; TETRAODONTOIDEI ; TETRAODONTIDAE; FUGU. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA NAKAMURA T., TROWSDALE J.; 

RL SUBMITTED (JUN-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AB004829; D1021371; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS01186; EGFJ; 29. 

DR PROSITE; PS01187; EGF_CA; 20. 

DR PFAM; PF00008; EGF; 35, 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJTER 1 1 

SQ SEQUENCE 2447 AA; 262542 MW; 3CDA4F7A CRC32; 

Query Match 14,2*; Score 124; DB 13; Length 2447; 

Best Local Similarity 42,9*; Pred. No. 1.14e-08; 

Matches 15; Conservative 8; Mismatches 10; Indels 2; Gaps 2; 

Db 683 CVHGKC ■ IEQQNGYFCQCEAGWVGQHCEQEKDECL 716 

Mill:: : |::| |:||||l : |: 
Qy 4 CHHGQCHISDRGEPYCLCQPGFSGHHCEQE-NPCM 37 



ID 055139 PRELIMINARY; PRT; 434 AA. 

AC 055139; 

DT OWUN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-AUG-199B (TREMBLREL, 07, LAST ANNOTATION UPDATE) 

DE JAGGED2 PROTEIN (FRAGMENT) , 

GN JAG2. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BRAIN; 

RA VALSECCHI V. , BALLABIO A. , RUGARLI E.I.; 

RL SUBMITTED (JUL- 1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; Y14495; E1227811; -. 

DR PROSITE; PS01186; EGF 2; 7. 

DR PROSITE; PS01187; EGF_CA; 5. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

FT NONJTER 434 434 

SQ SEQUENCE 434 AA; 46176 MW; D68AE029 CRC32; 

Query Match 14.1*; score 123; DB 11; Length 434; 

Best Local Similarity 36.4*; Pred. No. 1.81e-08; 

Matches 16; Conservative 13; Mismatches 11; Indels 4; Gaps 4; 

Db 282 CGPHGHC-VSLPGGNFSCICDSGFTGTYCHENIDDCMGQPCRNG 324 

I 11:1 :| I : |:|;:||:| I :: : III: |:: 
Qy 4 C-HHGQCHISDRGEPY-CLCQPGFSGHHCEQE-NPCMGEIVREA 44 



RESULT 9 

ID 070219 PRELIMINARY; PRT; 518 AA. 

AC 070219; 

DT 01-AUG-1998 (TREMBLREL. 07, CREATED) 

DT 01-AUG-1998 (TREMBLREL. 07, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE JAGGED 2 (JAGGED 2 PROTEIN) (FRAGMENT). 

GN JAG 2. 

OS MUS MUSCULUS (MOUSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS, 

RN [1] 

RP SEQUENCE FROM N.A, 
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RC TISSUE-BRAIN; 

RA LAN Y., JIANG R. ( SHAWBER C, WEINMASTER G., GRIDLEY T.; 

RL SUBMITTED ( JON-1997) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; AF010137; G3Q57059; -, 

DR MGD; MGI : 1098270; J AG 2. 

DR PROSITE; PS01187; EGF_CA; 5. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 518 518 

SQ SEQUENCE 518 AA; 55119 MW; 7144EB20 CRC32; 

Query Match 14.1%; Score 123; DB 11; Length 518; 

Best Local Similarity 36.4%; Pred. No. 1.81e-08; 

Matches 16; Conservative 13; Mismatches 11; Indels 4; Gaps 4; 

Db 305 CGPHGHC-VSLPGGNFSCICDSGFTGTYCHENIDDCMGQPCRNG 347 

I Ihl :| I : |:|::||:| I :: : |||: |:: 
Qy 4 C-HHGQCHISDRGEPY-CLCQPGFSGHHCEQE-NPCMGEIVREA 44 



RESULT 10 

•P97607 PRELIMINARY; PRT; 1202 AA, 
P97607; 
01-MAY-1997 (TREMBLREL, 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL, 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE JAGGED2 (FRAGMENT). 

OS RATTUS NORVEGICUS (RAT) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97105852, 

RA SHAWBER C.J., BOULTER J,, LINDSELL C.E., WEINMASTER G.; 

RT "Jagged2; a serrate-like gene expressed during rat embryogenesis."; 

RL DEV. BIOL. 180:370-376(1996). 

DR EMBL; 070050; G1718248; -. 

DR PROSITE; PS01186; EGF_2; 11. 

DR PROSITE; PS01187; EGF.CA; 7. 

DR PFAM; PF00008; EGF; 14. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 1202 AA; 129703 MW; 697F4205 CRC32; 

Query Match 14.11; Score 123; DB 11; Length 1202; 

Best Local Similarity 36.41; Pred. No. 1.81e-08; 

Matches 16; Conservative 13; Mismatches 11; Indels 4; Gaps 4; 

Db 561 CGPHGHC-VSLPGGNFSCICDSGFTGTYCHENIDDCMGQPCRNG 603 




4 C - HHGQCHI SDRGEPY -CLCQPGFSGHHCEQE- NPCMGEIVREA 



RESULT 11 

ID Q99466 PRELIMINARY; PRT; 955 AA. 

AC Q99466; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE NOTCH4 (FRAGMENT). 

GN NOTCH4 . 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97311416. 

RA SUGAYA K . , SASANUMA S . , NOHATA J . , KIMURA T . , FUKAGAWA T . , 

RA NAKAMURA Y, , ANDO A., INOKO H., IKEMURA T., MITA K.; 

RT "Gene organization of human NOTCH4 and (CTG)n polymorphism in this 

RT human counterpart gene of mouse proto -oncogene Int3. n ; 

RL GENE 189:235-244(1997). 

DR EMBL; D86566; D1013803; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 11, 



DR PROSITE; PS01186; EGF_2 ; 17. 

DR PROSITE; PS01187; EGF CA; 9. 

DR PFAM; PF00008; EGF; 21. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 955 955 

SQ SEQUENCE 955 AA; 100017 MW; 28507B36 CRC32; 

Query Match 13.91; Score 121; DB 4; Length 955; 

Best Local Similarity 43,2%; Pred. No. 4.52e-08; 

Matches 16; Conservative 6; Mismatches 12; Indels 3; Gaps 3; 

Db 401 CHGDAQCSTNPLTGSTLCLCQPGYSGPTCHQDLDECL 437 

II :M : I 111111:11 I |: : |: 
Qy 4 CH - HGQCHISD- RGEPYCLCQPGFSGHHCEQE -NPCM 37 



RESULT 12 

ID Q61204 PRELIMINARY; PRT; 1687 AA. 

AC Q61204; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT' 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH2-LIKE (EGF REPEAT TRANSMEMBRANE PROTEIN). 

GN NOTCH2L. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57BL/6J; TISSUE-WHOLE EMBRYO; 

RA SELL C, HOFF III H.B.; 

RL SUBMITTED (MAY-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; 057368; G1336628; -, 

DR MGD; MGL1202397; NOTCH2L. 

DR PROSITE; PS00010; ASXJYDROXYL; 2. 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PROSITE; PS01186; EGF_2 ; 5. 

DR PFAM; PF00008; EGF; 7. 

KW TRANSMEMBRANE; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1687 AA; 188528 MW; 73B9DDDC CRC32; 

Query Match 13.9%; Score 121; DB 11; Length 1687; 

Best Local Similarity 37.8%; Pred. No. 4.52e-08; 

Matches 14; Conservative 11; Mismatches 9; indels 3; Gaps 3; 

Db 344 CQNGGTCHMLSR - DT YECTCQVGFTGKQCQWTDACLS 379 

!::! Ih I : I I II Ihl :|: ::|:: 
Qy 4 CHHG -QCHI SDRGEPY -CLCQPGFSGHHCEQENPCMG 38 



RESULT 13 

ID 035442 PRELIMINARY; PRT; 1964 AA. 

AC 035442; 

DT 01-JAN-1998 (TREMBLREL, 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH4 , 

GN NOTCH4 . 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA ROWEN L., MAHAIRAS 6., QIN S,, AHEARN M.E., DANKERS C, LASKY S., 

RA LORETZ Ci, SCHMIDT S., TIPTON S., TRAICOFF R., ZACKRONE K., HOOD L; 

RL SUBMITTED (OCT-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; AF030001; G2564947; 

DR PROSITE; PS00010; ASXJYDROXYL; 11. 

DR PROSITE; PS01186; EGF_2; 21. 

DR PROSITE; PS01187; EGF_CA; 9. 

DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 2, 



Tue Jun 1 10:15:59 1999 



US-09-191-647-ll.rspt 



Page 5 



KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

SQ SEQUENCE 1964 AA; 206699 MW; CE2CA3B6 CRC32; 

Query Match 13.94; Score 121; DB 11; Length 1964; 

Best Local Similarity 40.51; Pred. No. 4.52e-08; 

Matches 15; Conservative 8; Mismatches 13; Indels 1; Gaps 1; 

Db 443 CEHGGSCINTPGSFNCLCLPGYTGSRCEADHNECLSO 479 

III 1:1 III !':: :|| : I |::: 
Qy 4 CHHGQCHISDRGEPYCLCQPGFSGHHCEQE-NPCMGE 39 



RESULT 14 

ID Q99940 PRELIMINARY; PRT; 1999 AA. 
AC Q99940; 

DT 01-MAY-1997 (TREMBLREL. 03, CREATED) 

tOl-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
NOTCH4. 
N0TCH4. 
OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 
OC CATARRHINI; HOMINIDAE; HOMO. 
RN [1] 

RP SEQUENCE FROM N.A. 

RA LI L., HUANG G,, BANTA A., DENG Y., CHEN L., PHAM Q., ROWEN L. ( 
RA HOOD L.; 

RL SUBMITTED (FEB-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U89335; G1841543; -, 

DR PROSITE; PS00010; ASXJYDROXYL; 11. 

DR PROSITE; PS01186; EGF 2; 21. 

DR PROSITE; PS01187; EGF CA; 9. 

DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00023; ank; 5, 

DR PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1999 AA; 209134 MW; 0680278E CRC32; 



Query Match 13.9*; Score 121; DB 4; Length 1999; 

Best Local Similarity 43.2%; Pred. No. 4.52e-08; 

Matches 16; Conservative 6; Mismatches 12; Indels 3; 

Db 400 CHGDAQCSTNPLTGSTLCLCQPGYSGPTCHQDLDECL 436 
II :H : I lllllhll I I: : I: 
4 CH-HGQCHISD-RGEPYCLCQPGFSGHHCEQE-NPCM 37 



t 



1ULT 15 

ID 000306 PRELIMINARY; PRT; '2003 AA. 

AC 000306; 

DT 01-JUL-1997 (TREMBLREL. 04, CREATED) 

DT 01-JUL-1997 (TREMBLREL. 04, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH4. 

GN HNOTCH4. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=BONE MARROW, HEART; 

RA LI L., HUANG G., BANTA A., YU D., ROWEN L. ( HOOD L,; 

RL SUBMITTED (MAR-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; 095299; G2072309; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 11, 

DR PROSITE; PS01186; EGF 2; 21. 

DR PROSITE; PS01187; EGF_CA; 9. 

DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00023; ank; 5. 

DR PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 2003 AA; 209620 MW; 518CFE96 CRC32; 



Query Match 13,94; Score 121; DB 4; Length 2003; 

Best Local Similarity 43.2*; Pred. No. 4.52e-08; 

Matches 16; Conservative 6; Mismatches 12; Indels 3; Gaps 3; 

Db 401 CHGDAQCSTNPLTGSTLCLCQPGYSGPTCHQDLDECL 437 

II :ll : I 111111:11 I I: : |: 
Qy 4 CH-HGQCHISD-RGEPYCLCQPGFSGHHCEQE-NPCM 37 



Search completed: Fri May 28 09:26:46 1999 
Job time : 41 sees. 
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Release 3.1A John F. Collins, Biocomputing Research Unit. . 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

^frch_pp protein • protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:28:42 1999; MasPar time 7.62 Seconds 

374.148 Million cell .updates/sec 

Tabular output not generated. 



.Title: 
Description: 
Perfect Score: 
Sequence: 

Scoring table': 
Searched: 



>US-09-191-647-12 

(1-134) from US09191647 .pep 

971 

1 HLRVLQLMENRISTIERGAF SFNHMPKLRTFRLHSNNIYC 134 

PAM 150 
Gap 11 

170751 seqs, 21266608 residues 



Post-processing: Minimum Match 04 

Listing first 45 summaries 



i-geneseq35 
1: parti 2 
8:part8 9 
14;partl4 
19:partl9 
24:part24 
29:part29 
34;part34 
39:part39 



Mean 30,391; Variance 166. £ 



:part2 3:part3 4:part4 5:part5 6:part6 7:part7 
:part9 10:partl0 ll:partll 12:partl2 13:partl3 

15:partl5 16:partl6 17:partl7 18:partl8 

20:part20 21:part21 22:part22 23:part23 

25:part25 26:part26 27:part27 28 :part28 

30:part30 31:part31 32:part32 33 :part33 

35:part35 36:part36 37:part37 38:part38 



scale 0.182 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Result 
No. 



Query 



Score 


Match Length 


DB 


ID 


Description 


Pred. No. 


779 


80.2 


1534 


30 


W46966 


Amino acid sequence o 


4.48e 


49 


357 


36.8 


1480 


5 


R25079 


Drosophila SLIT prote 


3.21e 


17 


267 


27.5 


1091 


27 


W41641 


Sequence used in dete 


1.16e 


10 


258 


26.6 


560 


12 


R71294 


Human glycoprotein V. 


5.12e 


10 


252 


26.0 


605 


17 


R85888 


WD-40 domain -contg, i 


1.38e 


09 


242 


24.9 


603 


17 


R85889 


WD-40 domain-contg. r 


7.10e 


09 


220 


22,7 


353 


1 


R05160 


Sequence of human bon 


2.56e 


07 


215 


22.1 


369 


15 


R87952 


Human neurotrophic bi 


5.75e 


07 


213 


21,9 


368 


1 


R05159 


Sequence of human bon 


7.94e 


07 


213 


21.9 


369 


15 


R87951 


Rat neurotrophic bigl 


7.94e 


07 


210 


21.6 


332 


15 


R87953 


Bovine neurotrophic b 


1.29e 


06 


196 


20.2 


234 


8 


R42265 


Decorin sequence PT-7 


1.22e 


05 


196 


20.2 


280 


8 


R42266 


Decorin sequence PT-7 


1.22e 


05 


196 


20.2 


305 


8 


R42267 


Decorin sequence PT-7 


1.22e 


05 


196 


20.2 


331 


8 


R42260 


Mature decorin PT-65, 


1.22e 


05 


196 


20.2 


342 


17 


R89439 


Human recombinant dec 


1.22e 


05 



17 


196 


20.2 


345 23 


W09405 


Pineal gland specific 


1.22e 


18 


196 


20,2 


1388 18 


R89471 


Collagen/decorin fusi 


1.22e 


19 


194 


20.0 


186 8 


R42264 


Decorin sequence PT-7 


1.67e 


20 


194 


20,0 


904 39 


W86351 


Human DNAX toll-like 


1 . 67e 


21 


194 


20,0 


1112 16 


R85298 


Tomato pathogen resis 


1.67e 


22 


194 


20.0 


1112 16 


R85299 


Tomato pathogen resis 


1.67e 


23 


192 


19,8 


196 5 


R29102 


Drosophila SLIT prote 


2.30e 


24 


191 


19,7 


610 9 


R51116 


Platelet glycoprotein 


2.70e 


25 


191 


19.7 


610 9 


R56664 


Mutant platelet glyco 


2.70e 


26 


191 


19.7 


610 23 


W18201 


Platelet glycoprotein 


2.70e 


27 


191 


19,7 


610 17 


R89436 


Mutated platelet glyc 


2.70e 


28 


184 


18.9 


976 21 


W13408 


Arabidopsis tha liana 


8,20e 


29 


180 


18.5 


293 1 


P91368 


45 kDa amino terminal 


1.54e 


30 


177 


18.2 


799 39 


W86352 


Human DNAX toll -like 


2.48e 


31 


177 


18.2 


837 39 


W86361 


Human DNAX toll -like 


2.48e 


32 


172 


17.7 


376 7 


R36773 


Human fibromodulin. 


5.43e 


33 


172 


17.7 


376 24 


W26404 


Human fibromodulin. 


5.43e 


34 


162 


16.7 


614 32 


W58382 


Human secreted protei 


2.58e 


35 


159 


16.4 


784 29 


W48245 


Human pro -tumour necr 


4. He 


36 


159 


16.4 


784 39 


W86350 


Human DNAX toll-like 


4. lie 


37 


159 


16,4 


784 39 


W90069 


Human TNF -alpha conve 


4. lie 


38 


158 


16,3 


1874 35 


W64518 


Adenylate cyclase pro 


4.79e 


39 


157 


16,2 


394 39 


W86363 


Mouse DNAX toll-like 


5.59e 


40 


156 


16,1 


139 8 


R42263 


Decorin sequence PT-7 


6.53e 


41 


156 


16,1 


1045 39 


W86354 


Human DNAX toll -like 


6.53e 


42 


154 


15,9 


644 39 


W82318 


Human 7 -transmembrane 


8.88e 


43 


150 


15,4 


764 5 


R24244 


Rat thyrotropin recep 


l.64e 


44 


149 


15,3 


342 6 


R30492 


N-terminal of LH rece 


1.91e 


45 


148 


15.2 


762 21 


W14778 


Human TSH receptor. 


2.23e 



RESULT 
ID 



Key 

Peptide 



Protein 



W46966 standard; Protein; 1534 AA. 
W46966; 

06- JUL-1998 (first entry) 

Amino acid sequence of a human slit-like polypeptide. 

Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

cancer; antibody. 

Homo sapiens. 

Location/Qualifiers 
1..26 

/note= "signal peptide" 
27.. 1534 
/note- "mature protein" 

J10087699-A. 

07- APR-1998. 

15- JUL-1997; 205351. 

16- JUL-1996; JP-186219, 
(ASAH ) ASAHI KASEI KOGYO KK. 
WPI; 98-267127/24. 

N-PSDB; V16978. 

Human Slit-like protein - useful for diagnosis and treatment of 
brain-specific diseases and cancers 
Disclosure; Pages 31-35; 45pp; Japanese. 
The present sequence represents a novel human si it- like protein (the 
mature protein is claimed in Claim 1). The slit-like polypeptide is 
useful for diagnosis and treatment of brain-specific diseases and 
cancers. Antibodies directed against the protein, or its fragments 
can also be used for diagnosing cancer. 
1534 AA; 



Query Match 80.2%; Score 779; DB 30; Length 1534; 

Best Local Similarity 77,6*; Pred, NO, 4.48e-49; 

Matches 104; Conservative 18; Mismatches 12; mdels 0; Gaps 0 

)b 86 qlrvlqlmenqigavergafddmkelerlrlnrnqlhmlpellfqnnqalsrldlsenai 145 

:ini!llll:|:::||ilM:|||l!llllh |:::||||| I IIIIMI I 
}y 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 146 qaiprkafrgatdlknlrldknqiscieegafralrglevltlnnnnittipvssfnhmp 205 
MINIUM |:|||;|| II 1 1 1 1 1 : 1 1 1 1 II I llllllllllll ::|:|MIII 
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Qy 61 QAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMP 120 
Db 206 klrtfrlhsnhlfc 219 
Qy 121 KLRTFRLHSNNLYC 134 



RESULT 2 

ID R25079 standard; Protein; 1480 AA, 

AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss. 

OS Drosophila melanogaster. 

FH Key Location/Qualifiers 

FT peptide 1. .36 

FT /label- signal 

FT domain 73.. 294 

♦/label- Flank_LRR_Flank_l 
/note- "mediates adhesive events" 
domain 295.. 518 

/label- Flank-LRR-FlankJ 
FT /note- "mediates adhesive events" 

FT domain 519., 714 

FT /label- Flank.LRRJlankJ 

FT /note- "mediates adhesive events" 

FT domain 715.. 910 

FT /label- Flank_LRR_Flank_4 

FT /note- "mediates adhesive events" 

FT region 911.. 1150 

FT /label- Tandem_EGF_like_repeats 

FT /note- "involved in protein-protein interactions" 

FT region 1353,. 1393 

FT /label- 7th_EGF_like_repeat 

FT /note- "involved in receptor- ligand interactions" 

FT region 1394.. 1404 

. FT /label- alternative_splice_segment 

FT /note- "developmentally regulated" 

FT region 1405.. 1480 

FT /label- C-terminal_region 

PN WO9210518-A. 

PD 25-JUN-1992. 

PF 27 -NOV- 1991; UQ9Q55. 

PR 07-DEC-1990; US-624135. 

PA (UYYA ) CNIV YALE. 

PI Artavanis-Tsakonas S, Rothberg JH; 

DR WPI; 92-234590/28, 

♦N-PSDB; Q25811. 
SLIT protein and sequence elements for treating 
neuro-degenerative disease - useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 1; Page 84-89; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways. The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse tham. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding, SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes -caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank-Leuclne-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 



Matches 


Db 


105 


Qy 


5 


Db 


165 


Qy 


65 


Db 


225 


Qy 


125 



CC claimed as are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 

Query Match 36.8%; Score 357; DB 5; Length 1480; 

Best Local Similarity 39,2%; Pred. No. 3.21e-17; 

51; Conservative 32; Mismatches 47; Indels 0; Gaps \ 



: | :: | II I I hi I 



I: hll -Ml! MM::: ||::| : 1 1 : 1 1 : 1 1 1 1 : | |; I 



RESULT 3 

ID W41641 standard; Protein; 1091 AA. 

AC W41641; 

DT 27-APR-1998 (first entry) 

DE Sequence used in detection method. 

KW Detection; mouse; murine, 

OS Mus sp. 

PN J09107971-A. 

PD 28-APR-1997. 

PF 19-OCT-1995; 270822. 

PR 19-OCT-1995; JP-270822. 

PA (TANA ) TANABE SEIYAKU CO. 

PA (TOYA/) TOYAMA S. 

DR WPI; 97-292464/27. 

DR N-PSDB; V04445. 

PT Detection of genes, useful for cloning genes of high and low 

pt expression - by homogenising prepared ds-cDNA pool from sample for 

PT comparison with each other to remove specific DNA fragment 

PS Claim 9; Pages 13-16; 18pp; Japanese. 

CC The present sequence was used in the development of a novel method 

CC for detecting genes showing difference in expression quantity 

CC between several samples , The method comprises preparing a double 

CC stranded cDNA pool of a standard organism sample, and homogenising 

CC ■ the contents of each DNA fragment contained in the resultant cDNA 

CC pool to prepare a content homogenised standard cDNA pool. Double 

CC stranded cDNA pools derived from each sample for several organism 

CC samples are compared with each other, and the DNA fragment 

CC associated with the DNA fragment in the cDNA pool from the content 

CC homogenised standard cDNA pool is removed to give a remaining cDNA 

CC pool for each of the samples to be compared. The method can clone 

CC gene groups of high and low expression level efficiently. 

SO Sequence 1091 AA; 

Query Match 27.54; Score 267; DB 27; Length 1091; 

Best Local Similarity 35.3%; Pred. No. 1.16e-10; 



:: | I llll :| III: I : I l||::| : :| I :|l :|||: |: 



Matches 


Db 


166 


Qy 


1 


Db 


225 


Qy 


60 


Db 


285 


Qy 


120 



hi: I II : MM:: : || |: 



I: ::| :|: 
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ID 


R71294 standard; Protein; 560 AA. 


AC 


R71294; 




DT 


18-AUG-1995 (first entry) 


DE 


Human glycoprotein V. 


KW 


Glycoprotein V, 




OS 


Homo sapiens. 




FH 


Key 


Location/Qualifiers 


FT 


peptide 


1..16 


FT 






FT 


modif ied_site 


51 


FT 




/label= N-glycosylation site 


FT 


modified_site 


181 


FT 




/label 3 N-glycosylation site 


FT 


modified_site 


244 


FT 




/label= N-glycosylation site 


FT 


modified_site 


267 


nm 




/note= "N-glycosylation site" 


1 


modif ied_site 


298 


w 




/label= N-glycosylation site 


FT 


modif ied_site 


312 


FT 




/label= N-glycosylation site 


FT 


modified_site 


385 


FT 




/label= N-glycosylation site 


FT 


cleavage_site 


476. .477 


FT 




/note* "putative thrombin cleavage site" 


FT 


modified_site 


499 


FT 




/label- N-glycosylation_site 


FT 


domain 


520.. 544 


FT 




/note= "putative transmembrane domain" 


PN 


WO9502054-A. 




PD 


19-JAN-1995. 




PF 


07-JUL-1994; U07644. 


PR 


09-JUL-1993; US-089455. 


PR 


03-DEC-1993; US-162599. 


PR 


10-FEB-1994; OS 


-195006. 


PA 


(CORT-) COR THERAPEUTICS INC, 


PI 


Cazenave J, Lanza F, Phillips DR; 


DR 


WPI; 95-066899/09. 


DR 


N-PSDB; Q85594. 




PT 


Platelet glycoprotein V gene • useful for producing glycoprotein 


PT 


V (GPV) and variants and generating antibodies to GPV 


PS 


Disclosure; Pag 


e 45-50; 82pp; English. 


CC 


Genomic clones were isolated from a human fibroblast library in 


CC 


lambda Fix using a 748 bp 32P-labeled glycoprotein V (GPV) cDNA 




probe. Exon-containing fragments from positive clones were 


1 


subcloned and s 


equenced. The full sequence of the human GPV qenc 




is given in Q85594 . 




Sequence 560 AA; 



Query Match 26.6*; Score 258; DB 12; Length 560; 

Best Local Similarity 32.8%; Pred, No, 5,12e-10; 



Matches 


43; Conservative 37; Mismatches 51; Indels 0; Gaps 


Db 


76 


lqrlmisdshisavapgtfsdliklktlrlsrnkithlpgalldkmvlleqlfldhnalr 135 
I: 1 : :::M:: hi 1! 1 |||:||:: :| |: Ml :: 
LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 


Qy 


2 


Db 


136 


gidqnmfqklvnlqelalnqnqldflpaslftnlenlklldlsgnnlthlpkgllgaqak 195 
:| :: h I:: :| |; ||: : : 1 | :| :| |: ||;|:|: : ; :| 
AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 


Qy 


62 


Db 


196 


lerlllhsnrl 206 


Qy 


122 


1 : Itll 1 
LRTFRLHSNNL 132 



RESULT 5 

ID R85888 standard; Protein; 605 AA. 

AC R85888; 

DT 13-SEP-1996 (first entry) 

DE WD-40 domain-contg. insulin-like growth factor binding protein,, 

KW WD40 repeat region; beta-transducin; protein -protein interaction; drug; 

KW intracellular signalling; protein kinase C; homology; motif; modulator; 



KW receptors of activated protein kinase; enzyme activity; isozyme; human. 

OS Synthetic. 

PN W09521252-A2, 

PD 10-AOG-1995. 

PF 31-JAN-1995; 001210. 

PR. 01-FEB-1994; OS-190802, 

PA (STRD ) UNIV LELAND STANFORD JUNIOR. 

PI Mochly-Rosen D, Ron D; 

DR WPI; 95-283772/37. 

PT New WD-40 (beta-transducin) -derived polypeptide^ ) - which alter the 

PT activity of a protein, eg, protein kinase C, which interacts with a 

PT protein contg. a WD-40 region, 

PS Example 5; Page 122-125; 351pp; English. 

CC Proteins R85851-92 are protein which contain at least one WD-40 (also 

CC called beta-transducing homologous) amino acid repeat motifs. The WD-40 

CC regions are involved in protein-protein interactions between proteins 

CC involved in intracellular signalling. An example of such an interaction 

CC is between protein kinase C and receptors of activated protein kinase 

CC (RACK), esp. RACK-1 (R85850). Proteins R85851-82 were isolated based on 

CC homology with beta-transducin, whereas proteins R85882-92 were isolated 

CC based on homology with the WD-40 consensus sequence (R85893). The 

CC proteins were used to construct the peptides R84928 -R85063 and 

CC R85786-R85842 . The peptides can be used to identify target proteins 

CC contg. WD-40 motifs, as modulators of enzyme esp. isozyme, activity of 

CC proteins involved in protein -protein interaction and to screen for drugs 

CC that will affect protein -protein interaction involving WD-40 domains. 

SQ Sequence 605 AA; 

Query Match 26.0%; Score 252; DB 17; Length 605; 

Best Local Similarity 32.84; Pred. No. 1.38e-09; 



Matches 


43; Conservative 30; Mismatches 58; Indels 0; Gaps 


Db 


220 


lreldlsrnalraikanvfvqlprlqklyldrnliaavapgaflglkalrwldlshnrva 279 
II hi 1 : :| 1 :l |::| hi : : III I :|||| |:: 
LRVLQLMENRI ST I ERG AFQDLKELERLRLNRNNLQLFPELLFLGTARLY RLDLSENQ IQ 61 


Qy 


2 


Db 


280 


glledtfpgllglrvlrlshnaiaslrprtfkdlhfleelqlghnrirqlaersfeglgq 339 

" :l 1 : " hi 1 h : :|: h II 1 1 :| 1 :|: lh : 


Qy 


62 


AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 


Db 


340 


levltldhnql 350 


Qy 


122 


1 : 1 1 1 
LRTFRLHSNNL 132 



RESULT 6 

ID R85889 standard; Protein; 603 AA. 

AC R85889; 

DT 13-SEP-1996 (first entry) 

DE WD-40 domain-contg. rat insulin-like growth factor binding protein. 

KW WD40 repeat region; beta-transducin; protein -protein interaction; drug; 

KW intracellular signalling; protein kinase C; homology; motif; modulator; 

KW receptors of activated protein kinase; enzyme activity; isozyme; human. 

OS Rattus rattus. 

PN W09521252-A2. 

PD 10-AUG-1995. 

PF 31-JAN-1995; U01210. 

PR 01-FEB-1994; US-190802. 

PA (STRD ) UNIV LELAND STANFORD JUNIOR. 

PI Mochly-Rosen D, Ron D; 

DR WPI; 95-283772/37. 

PT New WD-40 (beta-transducin) -derived polypeptide(s) - which alter the 

PT activity of a protein, eg, protein kinase C, which interacts with a 

PT protein contg. a WD-40 region. 

PS Example 5; Page 125-128; 351pp; English. 

CC Proteins R85851-92 are protein which contain at least one WD-40 (also 

CC called beta-transducing homologous) amino acid repeat motifs. The WD-40 

CC regions are involved in protein -protein interactions between proteins 

CC involved in intracellular signalling, An example of such an interaction 

CC is between protein kinase C and receptors of activated protein kinase 

CC (RACK), esp. RACK-1 (R85850). Proteins R85851-82 were isolated based on 

CC homology with beta-transducin, whereas proteins R85882-92 were isolated 

CC based on homology with the WD-40 consensus sequence (R85893), The 
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CC proteins were used to construct the peptides R84928-R85063 and 

CC R85786-R85842. The peptides can be used to identify target proteins 

CC contg. WD-40 motifs, as modulators of enzyme esp. isozyme, activity of 

CC proteins involved in protein -protein interaction and to screen for drugs 

CC that will affect protein-protein interaction involving WD-40 domains. 

SQ Sequence 603 AA; 

Query Match 24.9%; Score 242; DB 17; Length 603; 

Best Local Similarity 34.1%; Pred. No. 7 .10e-09; 

46; Conservative 34; Mismatches 52; Indels 3; Gaps 3; 



Matches 


Db 


240 


oy 


1 


Db 


300 


oy 


58 


Db 


360 


ft 


118 



II II I :: I I::: III :| I ;| |::| 



III III: 



protein; 353 AA, 



: : :|||||||:| I : |;:|: 



RESULT 7 
ID R05160 
AC R05160; 
DT 09-OCT-1990 (first entry) 
DE Sequence of human bone proteoglycan II (decor in) , 
KW Osteoporosis; rheumatoid arthritis; Paget' s disease; 
KW atherosclerosis; periodontal; human bone matrix; proteoglycan, 
OS Homo sapiens. 
PN US7432044-A, 
PD 17-APR-1990. 
PF 3-NOV-1989; 432044. 
PR 3-NOV-1989; OS-432044. 
PA (USSH) Nat Inst of Health. 
PI Termine J; 
DR WPI; 90-178641/23. 
DR N-PSDB; Q04491 . 

PT Human bone matrix DNA and proteins - 

PT used in detection, diagnosis and treatment involving skeletal 

PT and/or connective tissue disease states. 

PS Disclosure; p; English. 

CC Probes and Abs raised to the proteins can be used to determine 
CC their levels useful in diagnosis of associated conective tissue 
CC diseases states such as osteoporosis, osteo/rheumatoid arthritis, 
CC Paget' s disease, artherosclerosis and periodontal disease. 

t Proteins may also be used to induce or block biological function 
Sequence 353 AA; 
uuery Match 22.7%; 
Best Local Similarity 32.8%; 
Matches 41; Conservative 



Score 220; DB 1; Length 353; 
Pred. No, 2.56e-07; 

33; Mismatches 46; Indels 5; Gaps 4; 



Db 



100 nlhalilvnnkiskvspgaf tplvklerlylsknqlkelpekm- -pkt-lqelrahenei 156 
:|: I |::|:H : III I INI |::| I :|| : : I I Ihl 
Oy 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 157 tkvrkvtfnglnqmivielgtnplkssgiengafqgmkklsyiriadtnitkvrkvtfng 216 

: : :| I " "I I : I ll:lll:::: I : : : III:: :|| 
Oy 61 QAIPRKAFRGAVDIKNLQLDYNQI - SC - IEDGAFRALRDLEVLTLNNNNITRLSVASFNH 118 

Db 217 ltelh 221 

: I: 

Qy 119 MPKLR 123 



RESULT 8 

ID R87952 standard; Protein; 369 AA. 
AC R87952; 

DT 20-MAR-1996 (first entry) 
DE Human neurotrophic biglycan. 



Biglycan; proteoglycan; chondroitin sulphate; neuron protection; 
neurotrophic; central nervous system; CNS; memory loss; dementia; 
learning. 
Homo sapiens, 
Key 

peptide 



region 



Location/Qualifiers 
1..37 

/label- Sig_peptide ' 
44. .60 

/label- Hypervariable_region 
WO9530432-A1, 
16-NOV-1995, 
09-MAY-1994; E01479. 
09-MAY-1994; WO-E01479. 
(BOEF ) BOEHRINGER MANNHEIM GMBH. 
Hasenoehrl R, Huston J, Junghans U, Kappler J, Koops A; 
Mueller HW; 
WPI; 95-403938/51. 

Proteoglycan cpds., partic, chondroitin sulphate proteoglycan (s) * 
for maintain structural and function of the CNS and attenuating 
memory deficit (s) in the elderly and patients with dementia 
Claim 3; Fig 8; 60pp; English. 

Human biglycan (R87952) is a chondroitin sulphate proteoglycan with 
neurotrophic activity for brain neurons. It can be used to enhance 
the survival and maintain the structure and function of CNS neurons 
during normal ageing as well as after pathological and/or traumatic 
nervous system damage. It can also be used to restore function 
following nervous system lesions and degenerative diseases, and to 
improve learning efficiency and memory in the elderly and in patients 
with dementia. 
5Q Sequence 369 AA; 

Query Match 22.1%; Score 215; DB 15; Length 369; 

Best Local Similarity 36.4%; Pred. No. 5.75e-07; 

Matches 43; Conservative 29; Mismatches 41; Indels 5; Gaps 5 

3b 209 lklnylriseakltgipkdlpetlnelhldhnkiqai-eledllrysklyrlglghnqir 267 

I I III I II I |: |:|::|::| : II :| ::|IN I: III: 
3y 5 LQLMENRISTIE-RGAFQDLKE-LERLRLNRNNLQLFPELL-FLGTARLYRLDLSENQIQ 61 



Db 268 miengslsflptlrelhldnnklsrvpag-lpdlkllqvvylhsnnitkvgvndfcpm 324 

I :: :::|:|| I :| : I : |: |:|: |::||||:::| I I 
0y 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 



ID R05159 standard; protein; 368 AA. 

AC R05159; 

DT 09-OCT-1990 (first entry) 

DE Sequence of human bone proteoglycan I (biglycan) , 

KW Osteoporosis; rheumatoid arthritis; Paget' s disease; 

KW atherosclerosis; periodontal; human bone matrix; proteoglycan. 

OS Homo sapiens. 

PN OS7432044-A. 

PD 17-APR-1990. 

PF 3-NOV-1989; 432044. 

PR 3-NOV-1989; US-432044. 

PA (USSH) Nat Inst of Health, 

PI Termine J; 

DR WPI; 90-178641/23, 

DR N-PSDB; Q04490. 

PT Human bone matrix DNA and proteins - 

PT used in detection, diagnosis and treatment involving skeletal 

PT and/or connective tissue disease states, 

PS Disclosure; p; English. 

CC Probes and Abs raised to the proteins can be used to determine 

CC their levels useful in diagnosis of associated conective tissue 

CC diseases states such as osteoporosis, osteo/rheumatoid arthritis, 

CC Paget ' s disease, artherosclerosis and periodontal disease. 

CC Proteins may also be used to induce or block biological function, 

SQ Sequence 368 AA; 

Query Match 21,9%; Score 213; DB 1; Length 368; 

Best Local Similarity 35.6%; Pred, No. 7,94e-07; 
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Matches 42; Conservative 30; Mismatches 41; Indels 5; Gaps 5; 

Db 208 Dtlnylriseakltgipkdlpetlnelhldhnkiqai-eledllrysklyrlglghnqir 266 

I I II! I II I I: I:|::|::| : II :| ::|||l I: III: 
Oy 5 LQLMENRIST IE • RGAFQDLKE - LERLRLNRNNLQLFPELL - FLGT ARLYRLDLSENQ IQ 61 

Db 267 miengslsflptlrelhldnnklarvpsg-lpdlkllqvvylhsnnitkvgvndfcpm 323 

I := :::|:|| I : I : |: |:|: |::||||:::| I I 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 



RESULT 10 

ID R87951 standard; Protein; 369 AA. 

AC R87951; 

DT 20-MAR-1996 (first entry) 

DE Rat neurotrophic biglycan, 

#8iglycan; proteoglycan; chondroitin sulphate; neuron. protection; 
neurotrophic; central nervous system; CNS; memory loss; dementia; 
learning. 

OS Rattus sp. 

FH Key Location/Qualifiers 

FT peptide 1..37 

FT /label= Sig_peptide 

FT region 44.. 60 

FT /label- Hypervariable_region 

PN WO9530432-A1. 

PD 16-NOV-1995. 

PF 09-MAY-1994; E01479. 

PR 09-MAM994; WO-E01479, 

PA (BOEF ) BOEHRINGER MANNHEIM GMBH. 

PI Hasenoehrl R, Huston J, Junghans U, Kappler J, Koops A; 

PI Mueller HW; 

DR WPI; 95-403938/51. 

DR N-PSDB; T08768. 

PT Proteoglycan cpds. , partic. chondroitin sulphate proteoglycan(s) - 

PT for maintain structural and function of the CNS and attenuating 

PT memory deficit(s) in the elderly and patients with dementia 

PS Claim 1; Page 44-45; 60pp; English, 

CC Rat biglycan (R87951) is a chondroitin sulphate proteoglycan with 

CC neurotrophic activity for brain neurons. Recombinant biglycan, 

CC obtd. by expression of encoding cDNA (T08768) in eukaryotic host 

CC cells, can be used to enhance the survival and maintain the structure 

CC and function of CNS neurons during normal ageing as well as after 

CC pathological and/or traumatic nervous system damage. It can also 

^ be used to restore function following nervous system lesions and 
degenerative diseases, and to improve learning efficiency and memory 

mj in the elderly and in patients with dementia. 
Sequence 369 AA; 

Query Match 21.9*; Score 213; DB 15; Length 369; 

Best Local Similarity 35.6%; Pred. No. 7.94e-07; 

Matches 42; Conservative 30; Mismatches 41; Indels 5; Gaps 5; 

Db 209 lklnylriseakltgipkdlpetlnelhldhnkiqai-eledllrysklyrlglghnqir 267 

Mill I II I I: |:|::|::| : II :| ::|||| |: |||; 
Qy 5 LQLMENRI ST IE - RGAFQDLKE - LERLRLNRNNLQLFPELL - FLGTARLYRLDLSENQIQ 61 

Db 268 miengslsflptlrelhldnnklsrvpag-lpdlkllqvvylhsnnitkvgindfcpm 324 

I :: :::|:|| I :| : I : h |:|: |::||||:::: | | 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 



RESULT 11 

ID R87953 standard; Protein; 332 AA. 

AC R87953; 

DT 20-MAR-1996 (first entry) 

DE Bovine neurotrophic biglycan. 

KW Biglycan; proteoglycan; chondroitin sulphate; neuron protection; 

KW neurotrophic; central nervous system; CNS; memory loss; dementia; 

KW learning, 

OS Bos taurus. 

FH Key Location/Qualifiers 

FT region 7., 2 3 



ft /label- Hypervariable_region 

PN WO9530432-A1, 

PD 16-NOV-1995, 

PF 09-MAY-1994; E01479. 

PR 09-MAM994; WO-E01479. 

PA (BOEF ) BOEHRINGER MANNHEIM GMBH. 

PI Hasenoehrl R, Huston J, Junghans U, Kappler J, Koops A; 

PI Mueller HW; 

DR WPI; 95-403938/51. 

PT Proteoglycan cpds., partic, chondroitin sulphate proteoglycan (s) - 

PT for maintain structural and function of the CNS and attenuating 

PT memory deficit (s) in the elderly and patients with dementia 

PS Claim 3; Fig 8; 60pp; English. 

CC Bovine biglycan (R87953) is a chondroitin sulphate proteoglycan with 

CC neurotrophic activity for brain neurons . It can be used to enhance 

CC the survival and maintain the structure and function of CNS neurons 

CC during normal ageing as well as after pathological and/or traumatic 

CC nervous system damage, It can also be used to restore function 

CC following nervous system lesions and degenerative diseases, and to 

CC improve learning efficiency and memory in the elderly and in patients 

CC with dementia, 

SQ Sequence 332 AA; 

Query Match 21,64; Score 210; DB 15; Length 332; 

Best Local Similarity 36,54; Pred. No. 1.29e-06; 

Matches 42; Conservative 28; Mismatches 40; Indels 5; Gaps 5 

Db 172 lklnylriseakltgipkdlpetlnelhldhnkiqai-eledllrysklyrlglghnqir 230 

I I III I II I I: I:|::h:| : II :| ::|||| |: III: 
Qy 5 LQLMENRISTIE-RGAFQDLKE-LERLRLNRNNLQLFPELL-FLGTARLYRLDLSENQIQ 61 

Db 231 miengslsflptlrelhldnnklsrvpag-lpdlkllqvvylhtnnitkvgvndf 284 

I :::|:|| I :| : I : |: |:|: |: ||||:::| | 

Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASF 116 



RESULT 12 

ID R42265 standard; Protein; 234 AA. 

AC R42265; 

DT 28-APR-1994 (first entry) 

DE Decorin sequence PT-76 (N-terminal to LRR8). 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

KW decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta, 

PN WO9320202-A. 

PD 14-OCT-1993, 

PF 02-APR-1993; U03171, 

PR 03-APR-1992; US-865652. 

PA (LJOL-) LA JOLLA CANCER RES FOUND. 

PI Cardenas J, Craig W, Mullen DG, Pierschbacher MD; 

PI Ruoslahti El; 

DR WPI; 93-336910/42, 

DR N-PSDB; Q50051. 

PT Active fragments of protein esp. decorin - with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 

PS Claim 10; Page 45-46; 77pp; English. 

CC Active fragments of decorin (full-length coding sequence Q50046) 

CC were generated by PCR and fused to Maltose Binding Protein. The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp. TGF-beta, and hence for treating 

CC conditions associated with over -activity of the growth factor such 

CC as certain tumours. 

SQ Sequence 234 AA; 

Query Match 20.24; Score 196; DB 8; Length 234; 

Best local Similarity 33.34; Pred. No. 1.22e-05; 

Matches 38; Conservative 30; Mismatches 41; Indels 5; Gaps 4; 

Db 78 nlhalilvnnkiskvspgaftplvklerlylsknqlkelpekm--pkt-lqelrahenei 134 

:|: I !::hll : III I III |::| | :|| : :| ||:| 
Qy 1 HLRVLQLMENRISTIERGAFQDLKEIERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 



Tue Jun 1 10:15:59 1999 



US-09-191-! 



■647-12. rag 



Page 



Db 135 tkvrkvtfnglnqmivielgtnplkssgiengafqgmkklsyiriadtnitsip 188 

: : :l I :: ::| I : I Ihlll:::: I : : : III ::' 
Qy 61 QAIPRKAFRGAVDIKNLQLDYNQI-SC-IEDGAFRALRDLEVLTLNNNNITRLS 112 



RESULT 13 

ID R42266 standard; Protein; 280 AA. 

AC R42266; 

DI 28-APR-1994 (first entry) 

DE Decorin sequence PT-77 (it-terminal to LRR10). 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

kw decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta, 

PN WO9320202-A. 

PD 14-OCM993. 

PF 02-APR-1993; U03171, 

PR Q3-APR-1992; US-865652, 

PA (LJOL-) LA JOLLA CANCER RES FOUND. 

PI Cardenas J, Craig W, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

•WPI; 93-336910/42. 
N-PSDB; Q50052. 
Active fragments of protein esp. decorin - with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 

PS Claim 10; Page 47-48; 77pp; English. 

CC Active fragments of decorin (full-length coding sequence Q50046) 

CC were generated by PCR and fused to Maltose Binding Protein. The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp. TGF-beta, and hence for treating 

CC conditions associated with over -activity of the growth factor such 

CC as certain tumours. 

SO Sequence 280 AA; 

Query Match 20.2*; Score 196; DB 8; Length 280; 

Best Local Similarity 33.34; Pred, No, 1.22e-05; 

Matches 38; Conservative 30; Mismatches 41; Indels 5; Gaps 4; 

Db 78 nlhalilvnnkiskvspgaftplvklerlylsknqlkelpekm--pkt-lqelrahenei 134 

:|: I I::|:|l : 111 I 1 1 1 1 |::| I :|| : : I I ||:| 
Qy 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 135 tkvrkvtfnglnqmivielgtnplkssgiengafqgmkklsyiriadtnitsip 188 

: : :| I ::l I : I l|:|||:::: I : : : III :: 
Qy 61 QAI PRK AFRGAVDI KNLQLDY NQ I - SC - 1 EDGAFRALRDLEVLTLNNNNITRLS 112 



RESULT 14 

fR42267 standard; Protein; 305 AA. 
R42267; 
28-APR-1994 (first entry) 
Decorin sequence PT-78 (N-terminal to half C-terminal). 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

KW decorin; PG.-II; PG-40; transforming growth factor-beta; TGF-beta. 

PN WO9320202-A. 

PD 14-OCT-1993. 

PF 02-APR-1993; O03171. 

PR 03-APR-1992; US-865652. 

PA (LJOL-) LA JOLLA CANCER RES FOUND. 

PI Cardenas J, Craig w, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

DR WPI; 93-336910/42. 

DR N-PSDB; Q50053, 

PT Active fragments of protein esp. decorin - with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 

PS Claim 10; Page 49-50; 77pp; English. 

CC Active fragments of decorin (full-length coding sequence Q50046) 

CC were generated by PCR and fused to Maltose Binding Protein. The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp. TGF-beta, and hence for treating 

CC conditions associated with over -activity of the growth factor such 



CC as certain tumours. 
SQ Sequence 305 AA; 



Query Match 20.2%; Score 196; DB 8; Length 305; 

Best Local Similarity 33.3*; Pred. No. 1.22e-05; 

Matches 38; Conservative 30; Mismatches 41; Indels 5; Gaps 4; 

Db 78 nlhalilvnnkiskvspgaftplvklerlylsknqlkelpekm--pkt-lqelrahenei 134 

:h I I::hll : III I MM MM I MM; : I I Ml 
Qy 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 135 tkvrkvtfnglnqmivielgtnplkssgiengafqgmkklsyiriadtnitsip 188 

: : :| I :: ::| I : I Ihlll":: I : : : III :: 
Qy 61 QAIPRKAFRGAVDIKNLQLDYNQI-SC-IEDGAFRALRDLEVLTLNNNNITRLS 112 



RESULT 15 

ID R42260 standard; Protein; 331 AA, 

AC R42260; 

DT 28-APR-1994 (first entry) 

DE Mature decorin PT-65, 

KW leucine-rich repeat; proteoglycan; cell regulatory factor; MBP; 

KW fusion protein; maltose binding protein; tumour growth; inhibition; 

KW decorin; PG-II; PG-40; transforming growth factor-beta; TGF-beta, 

FH Key Location/Qualifiers 

FT region 1..45 

FT /label- N-terminal_region 

FT /note- "contains 4 Cys residues" 

FT region 46.. 280 

FT /label= repeat_region 

FT. /note- "contains 10 leucine-rich repeats" 

FT region 281.. 331 

FT /label- C-terminal_region 

PN WO9320202-A. 

PD 14-OCT-1993. 

PF 02-APR-1993; U03171. 

PR 03-APR-1992; US-865652. 

PA (LJOL-) LA JOLLA CANCER RES FOUND. 

PI Cardenas J, Craig W, Mullen DG, Pierschbacher MD; 

PI Ruoslahti EI; 

DR WPI; 93-336910/42. 

DR N-PSDB; Q50046. 

PT Active fragments of protein esp. decorin • with cell regulatory 

PT factor domain, useful for inhibiting cell regulatory factor 

PT activity 

PS Claim 10; Page 36-38; 77pp; English. 

CC Active fragments of decorin (full-length coding sequence Q50046) 

CC were generated by PCR and fused to Maltose Binding Protein. The 

CC resulting fusion proteins were useful for inhibiting the activity of 

CC a cell regulatory factor, esp. TGF-beta, and hence for treating 

CC conditions associated with over-activity of the growth factor such 

CC as certain tumours, 

SQ Sequence 331 AA; 

Query Match 20.24; Score 196; DB 8; Length 331; 

Best Local Similarity 33.34; Pred. no. 1.22e-05; 

Matches 38; Conservative 30; Mismatches 41; Indels 5; Gaps 4; 

Db 78 nlhalilvnnkiskvspgaftplvklerlylsknqlkelpekm--pkt-lqelrahenei 134 

M: I I : : I : I f : III I MM MM I :|| : : | | I; 

Qy 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 135 tkvrkvtfnglnqmivielgtnplkssgiengafqgmkklsyiriadtnitsip 188 

: : :l I :: "I I : I Ihlll:::: | : : ; ||| :; 

Qy 61 QAIPRKAFRGAVDIKNLQLDYNQI-SC-IEDGAFRALRDLEVLTLNNNNITRLS 112 



Search completed: Fri May 28 09:29:40 1999 
Job time : 58 sees. 
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Run 



Release 3.1A John F, Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

irch_pp protein - protein database search, using Smith-Waterman algorithm 

Fri May 2 



Tabular output not generated, 



3 09:29:57 1999; MasPar time 8,33 Seconds 

644.624 Million cell updates/sec 



Title: 

Description: 
Perfect Score: 
Sequence; 

Scoring table: 
Searched: 



>US-09-191-647-12 

(1-134) from US09191647.pep 

971 

1 HLRVLQLMENRIST I ERG AF SFNHMPKLRTFRLHSNNLYC 134 

PAM 150 
Gap 11 

122810 seqs, 40068593 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 

Database: pir60 

l:pirl 2:pir2 3:pir3 4:pir4 

Statistics; Mean 44.958; Variance 121.862; scale 0.369 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



% 

Query 



NO. 


Score 


Match Length D 


3 ID 


Description 


Pred. No. 


1 


366 


37.7 


1469 


B36665 


slit protein 2 precur 


7.70e 


34 


2 


366 


37.7 


1480 


A36665 


slit protein 1 precur 


7.70e 


34 


3 


298 


30.7 


682 


A49121 


cell -surface molecule 


2.53e 


24 


4 


298 


30.7 


682 


A43318 


connectin precursor - 


2.53e 


24 


5 


267 


27.5 


1091 


A58532 


glial cell membrane g 


4.31e 


20 


6 


264 


27.2 


1134 


A29944 


chaoptin precursor - 


l.lOe 


19 


7 


261 


26.9 


907 


JE0176 


orphan G protein -coup 


2.78e 


19 


8 


260 


26.8 


536 


A34901 


lysine carboxypeptida 


3,79e 


19 


9 


258 


26.6 


560 


A60164 


platelet membrane gly 


7.04e 


19 


10 


252 


26.0 


605 


A41915 


insulin-like growth f 


4.48e 


18 


11 


248 


25.5 


605 


JC5239 


insulin-like growth f 


1.53e 


17 


12 


242 


24.9 


603 


JC1282 


insulin-like growth f 


9.62e 


17 


13 


241 


24.8 


603 


JC6128 


insulin-like growth f 


1.31e 


16 


14 


237 


24.4 


1115 


S40241 


G protein -coupled rec 


4.41e 


16 


15 


235 


24.2 


312 


NBHUA2 


leucine-rich alpha-2- 


8.10e 


16 


16 


222 


22.9 


361 


A53860 


chondroadherin precur 


4.09e 


14 


17 


222 


22.9 


662 


S42799 


garp precursor - huma 


4.09e 


14 


18 


213 


21.9 


368 


BGHUN 


biglycan precursor - 


6.00e 


13 


19 


213 


21.9 


369 


S32793 


biglycan precursor ■ 


6.00e 


13 


20 


213 


21.9 


369 


S20811 


proteoglycan I - mous 


6.00e 


13 


21 


210 


21.6 


369 


S32559 


biglycan precursor ■ 


1.46e 


12 


22 


208 


21.4 


1097 


A29943 


Toll protein precurso 


2.63e 


12 


23 


205 


21.1 


357 


S24317 


decorin precursor - c 


6.38e 


12 



24 


203 20.9 


1535 


S46224 ' 


peroxidasin * fruit f 


1.15e 


11 


25 


202 20.8 


360 


S06280 


decorin precursor - b 


l,54e 


11 


26 


201 20.7 


360 


147020 


decorin - rabbit 


2 . 06e 


11 


27 


196 20.2 


359 


NBHUC8 


decorin precursor - h 


8.87e 


11 


28 


192 19.8 


1256 


S60461 


gene flightless -I pro 


2.83e 


10 


29 


190 19.6 


354 


A55454 


decorin precursor - in 


5.04e 


10 


30 


189 19.5 


382 


139068 


proline- arginine-ric 


6.72e 


10 


31 


188 19.4 


925 


JC2033 


G protein-coupled rec 


8,96e 


10 


32 


182 18,7 


424 


S27783 


hypothetical protein 


4.99e 


09 


33 


180 18.5 


354 


S29145 


decorin precursor - r 


8,81e 


09 


34 


180 18.5 


626 


NBHUIA 


platelet glycoprotein 


8.81e 


09 


35 


180 18.5 


661 


156258 


RP105 - mouse 


8.81e 


09 


36 


179 18.4 


277 


S25770 


RSP-1 protein - mouse 


1.17e 


08 


37 


178 18.3 


277 


160122 


rsu-1 homolog - human 


1.55e 


08 


38 


176 18.1 


1268 


A49674 


flightless -I homolog 


2,73e 


08 


39 


175 18.0 


1839 


OYBYK 


adenylate cyclase (EC 


3.62e 


08 


40 


172 17,7 


376 


S55275 


fibromodulin precurso 


8.42e 


08 


41 


171 17.6 


2145 


JC4747 


adenylate cyclase (EC 


l.lle 


07 


42 


165 17.0 


375 


S05390 


fibromodulin precurso 


5.90e 


07 


43 


162 16.7 


764 


JC5643 


thyroid stimulating h 


l,35e 


06 


44 


159 16,4 


2026 


OYBY 


adenylate cyclase (EC 


3.07e 


06 


45 


157 16,2 


1692 


A33988 


adenylate cyclase (EC 


5.29e 


06 



RESULT 
ENTRY 
TITLE 



1 



ORGANISM 
DATE 



ACCESSIONS 



lauthors 



I journal 
•title 



B36665 ttype complete 
slit protein 2 precursor - fruit fly (Drosophila 

melanogaster) 
tformal.name Drosophila melanogaster 
30-Apr-1991 fsequence_revision 30-Apr-1991 ftext change 

16-Dec-1998 
B36665 
A36665 

Rothberg, J.M.; Jacobs, J.R,; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross -references MOID: 91099665 
laccession B36665 
## status preliminary 
(timolecule.type mRNA 
^residues 1-1469 ftlabel ROT 
tfcross-references GB:X53959 

GENETICS 

tgene FlyBase:sli 

Across -references FlyBase : FBgnO0034 25 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl -terminal homology 



FEATURE 




66-91 


tdomain proteoglycan amino-terminal homology tlabel 




PAH1\ 


101-124 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR1\ 


125-148 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology ilabel LRR2\ 


149-172 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR3\ 


173-196 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LRR4\ 


197-220 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


228-272 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS1\ 


288-313 


tdomain proteoglycan amino-terminal homology tlabel 




PAH2\ 


323-346 


tdomain leucine-rich alpha - 2 -glycoprotein repeat 




homology tlabel LRR6\ 


347-370 


tdomain leucine-rich alpha-2-glycoprotein repeat 
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homology t label LRR7\ 


371-394 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology 1 label LRR8\ 


395-418 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology t label LRR9\ 


419-442 


tdomain leucine-rich alpha - 2 -glycoprotein repeat 


450-494 


homology t label LR10\ 


tdomain proteoglycan carboxyl- terminal homology llabel 




PCS2\ 


512-537 


tdomain proteoglycan amino-terminal homology tlabel 




PAH3\ 


547-571 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR11\ 


572-595 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology f label LR12\ 


596-619 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR13\ 


620-643 


tdomain leucine-rich alpha* 2 -glycoprotein repeat 


651-695 


homology tlabel LR14\ 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS3\ 


|708-733 


tdomain proteoglycan araino-terminal homology tlabel 




PAH4\ 


743-766 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 




homology tlabel LR15\ 


767-790 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR16\ 


846-890 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



tlength 1469 tmolecular-weight 164695 tchecksum 8361 



Query Match 37.7%; 
Best Local Similarity 40.0%; 
Matches 52; Conservative 



Score 366; DB 2; Length 1469; 

Pred. No. 7.70e-34; 

31; Mismatches 47; Indels 0; 



Gaps 



Db 105 LELQGNNLTVIYETDFQRLTKLRMLQLTDNQIHTIERNSFQDLVSLERLDISNNVITTVG 164 

hi I :: I II I I 1:1 I :: : I I |||:|:| | :: 
Qy 5 LQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQAIP 64 

Db 165 RRVFKGAQSLRSLQLDNNQITCLDEHAFKGLVELEILTLNNNNLTSLPHNIFGGLGRLRA 224 

h hll :::IHI lll:|::: l|::| : 1 1 : 1 1 1 1 1 1 1 : 1 |: I : :||: 
Oy 65 RKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKLRT 124 

Db 225 LRLSDNPFAC 234 

ill I : I 
Oy 125 FRLHSNNLYC 134 



^Pry A36665 ttype complete 

nTLE slit protein 1 precursor - fruit fly (Drosophila 

melanogaster) 
tformaljame Drosophila melanogaster 
30-Apr-1991 tsequence_revision 30-Apr-1991 ttext change 

24-Sep-1998 
A36665; S13523 
A36665 

Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4 : 2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains . 
tcross-references MUID: 91099665 
taccession A36665 

ttstatus preliminary 
ttmolecule_type mRNA 
ttresidues 1-1480 Mlabel ROT 
* tcross-references GB:X53959; NID:g8614; PID:g8615 
GENETICS 

tgene FlyBaseisli 

ttcross-references FlyBase:FBgn0003425 



|ULT 2 



ORGANISM 
DATE 

ACCESSIONS 
REFERENCE 
tauthors 

tjournal 
ttitle 





tsuperfamily proteoglycan amino-terminal homology; EGF 




homology; leucine-rich alpha-2-glycoprotein repeat 




homology; proteoglycan carboxyl-terminal homology 




alternative splicing 






66-91 


tdomain proteoglycan amino-terminal homology tlabel 






101-124 


iruuiiiaiu leucine i xun alalia L ^lytupiULcin IcpcuL 




homology tlabel LRR1\ 


125-148 


tdomain leucine*rich alpha- 2 -glycoprotein repeat 




homology tlabel LRR2\ 


149-172 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LRR3\ 


17 3-195 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR4\ 




tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


228-272 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS1\ 


288-313 


tdomain proteoglycan amino-terminal homology tlabel 




PAH2\ 


323-34b 


tdomain leucine*rich alpha*2*glycoprotein repeat 




homology tlabel LRR6\ 


347*370 


tdomain leucine-rich alpha- 2 -glycoprotein repeat 




homology tlabel LRR7\ 


371-394 


tdomain leucine-rich alpha- 2 -g lycoprotein repeat 




homology tlabel LRR8\ 


395-418 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR9\ 


419*442 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR10\ 


43U 474 


tdomain proteoglycan carboxyl 'terminal homology tlabel 






512-537 


tdomain proteoglycan amino-terminal homology tlabel 




PAH3\ 


547*571 


tdomain leucine-rich alpha - 2 -glycoprotein repeat 




homology tlabel LR11\ 


572-595 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR12\ 


596*619 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR13\ 


620*643 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR14\ 


651*695 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS3\ 


708*733 


tdomain proteoglycan amino-terminal homology tlabel 




PAH4\ 


743*766 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR15\ 


767*790 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR16\ 


791-814 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR17\ 


815-838 


tdomain leucine-rich alpha-2*glycoprotein repeat 




homology tlabel LR18\ 


846-890 


tdomain proteoglycan carboxyl-terminal homology tlabel 




PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 


SUMMARY 


tlength 1480 tmolecular-weight 165751 tchecksum 900 



Query Match 



37.7%; Score 366; DB-2; Length! 



Best Local Similarity 40.0%; Pred. No. 7.70e-34; 

52; Conservative 31; Mismatches 47; Indels 



Matches 


Db 


105 


Qy 


5 


Db 


165 


Qy 


65 


Db 


225 



Ml I :: I 



I I lll:|:| I 



I: hll :::IMI 1 1 1 : 1 : : : lh:| :lhllllllhl h I 



225 LRLSDNPFAC 234 
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Qy 125 FRLHSNNLYC 134 



RESULT 3 
ENTRY 
TITLE 



ORGANISM 
DATE 



ACCESSIONS 



iauthors 
♦journal 
♦title 



A49121 ftype complete 

cell -surface molecule connectin - fruit fly (Drosophila 

melanogaster) 
♦formaljiame Drosophila melanogaster 
19-Dec-1993 #sequence_revlsion 18-Nov-1994 Ktext.change 

20-Mar-1998 
A49121 
A49121 

Gould, A. P.; White, R.A, 
Development (1992) 116:1163-1174 
Connectin, a target of homeotic gene control in Drosophila. 

• f cross -references MUID: 93202002 
♦accession A49121 
iistatus preliminary 
♦♦molecule.type nucleic acid 
firesidues 1-682 illabel GOD 
♦♦cross-references GB:X68701; NID:g7737; PID:g7738 
t #exper iraen tal_source embryo 

♦♦note sequence extracted from NCBI backbone (NCBIN:127661, 

NCBIP: 127664) 

GENETICS 

igene FlyBaseiCon 

ttcross-references FlyBase:FBgn0005775 
CLASSIFICATION fsuperfamily leucine-rich alpha -2 -glycoprotein repeat 
homology 



EATDR 
199 


222 


♦domain leucine-rich alpha- 


2 -glycoprotein repeat 






homology ♦ label LRR1\ 




223 


246 


♦domain leucine-rich alpha- 


2-glycoprotein repeat 






homology f label LRR2\ 


247 


270 


♦domain leucine-rich alpha - 


2 -glycoprotein repeat 






homology ♦ label LRR3\ 




271 


294 


♦domain leucine-rich alpha- 


2-glycoprotein repeat 






homology ♦ label LRR4\ 




295 


318 


♦domain leucine-rich alpha - 


2-glycoprotein repeat 






homology ♦ label LRR5\ 


319 


342 


♦domain leucine-rich alpha- 


2-glycoprotein repeat 






homology ♦label LRR6\ 


343 


366 


♦domain leucine-rich alpha- 


2-glycoprotein repeat 






homology » label LRR7 




IMMAR 




♦length 682 tolecular-weight 7 


5922 ♦checksum 7093 


Query Match 


30.74; Score 298; DB 2 


; Length 682; 



Best Local Similarity 38.3%; 
51; Conservative 



Matches 


Db 


223 




1 


Db 


282 


Qy 


60 


Db 


342 


Qy 


120 



Pred. No. 2.53e- 
28; Mismatches 



52; Indels 2; Gaps 2; 



:l! I I hi -I ll-l I III II lh : I II III hh II 



II I :: I I I I: I I |::| I |:| 



hi: h:| I 



RESULT 4 

ENTRY A43318 Itype complete 

TITLE connectin precursor - fruit fly (Drosophila melanogaster) 

ORGANISM ♦formaljiame Drosophila melanogaster 

DATE 31-Dec-1993 tsequence revision 31-Dec-1993 tttext change 

24-Sep-1998 

ACCESSIONS A43318; S28464 

REFERENCE A43318 

♦authors Nose, A,; Mahajan, V.B.; Goodman, C.S. 



♦journal 
♦title 



REFERENCE 
♦authors 
♦submission 



Cell (1992) 70:553-567 

Connectin: a homophilic cell adhesion molecule expressed on a 
subset of muscles and the motoneurons that innervate them 
in Drosophila, 
♦cross-references MUID:92370678 
♦accession A43318 
♦♦molecule.type mRNA 
♦♦residues 1-682 tt label NOS 
♦♦cross-references GB:M96647; NID:gl57083; PID:gl57084 
♦♦note sequence extracted from NCBI backbone (NCBIN: 111422, 

NCBIP: 111423) 

S28464 

Gould, A. P.; White, R.A.H. 
submitted to the EMBL Data Library, October 1992 
♦description Connectin a target of homeotic gene control in drosophila. 
♦accession S28464 
♦♦molecule.type mRNA 

♦♦residues 1-630, 'G', 632-673, 675-678, 'M\ 679-682 ♦♦ label GOU 
♦♦cross-references EMBL;X68701; NID:g7737; PID:g7738 
GENETICS 

♦gene FlyBaseiCon 
♦♦cross-references FlyBase:FBgn0005775 
CLASSIFICATION ♦superfamily leucine-rich alpha-2 -glycoprotein repeat 
homology 

FEATURE 

1-26 tdomain signal sequence ♦status predicted ♦label SIG\ 

27-682 tproduct connectin ♦status predicted Uabel MAT 

SUMMARY length 682 taolecular -weight 75991 »checksum 7269 

Query Match 30.7%; Score 298; DB 2; Length 682; 

Best Local Similarity 38.3%; Pred. No. 2.53e-24; 

Matches 51; Conservative 28; Mismatches 52; Indels 2; Gaps 2; 

Db 223 RLRELNLEHNQIFEMDRYAFRNL - PLCERLFLNNNNI STLHEGLFADMARLTFLNLAHNQ 281 

:N I I hi ::| 1 1 : : I I Ml II lh : I II III hh II 
Qy 1 HLRVLQLMENRISTIERGAFQDLKEL-ERLRLNRNNLQLFPELLFLGTARLYRLDLSENQ 59 

Db 282 INVLTSEIFRGLGNLNVLKLTRNNLNFIGDTVFAELWSLSELELDDNRIERISERALDGL 341 

I : III ::: I I | :: | | | |: | | :; | : ::: : 
Qy 60 IQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 

Db 342 NTLRTLNLRNNLL 354 

hi: h:l I 
Qy 120 PKLRTFRLHSNNL 132 



RESULT 5 

ENTRY A58532 #type complete 

TITLE glial cell membrane glycoprotein LIG-1 precursor - mouse 

ORGANISM ♦formal.name Mus musculus ♦common.name house mouse 

DATE ll-Apr-1997 ♦sequence.revision ll-Apr-1997 ♦text.change 

17-Mar-1999 

ACCESSIONS A58532 

REFERENCE A58532 

♦authors Suzuki, Y. ; Sato, N.; Tohyama, M.; Wanaka, A.; Takagi, T. 

♦journal J. Biol, Chem, (1996) 271:22522-22527 

♦title cDNA cloning of a novel membrane glycoprotein that is 

expressed specifically in glial cells in the mouse brain; 
LIG-1, a protein with leucine-rich repeats and 
immunoglobulin -like domains, 
♦cross-references MOID: 96394313 
♦accession A58532 
♦♦status preliminary; translated from GB/EMBL/DDBJ 
♦♦molecule.type mRNA 
♦♦residues 1-1091 ♦♦label SUZ 
♦♦cross-references GB:D78572; NID:gl545806; PID:gl545807 
CLASSIFICATION ♦superfamily leucine-rich alpha-2 -glycoprotein repeat 
homology; proteoglycan amino-terminal homology; 
proteoglycan carboxyl -terminal homology 

FEATURE 

36-61 ♦domain proteoglycan amino-terminal homology ♦label PAH\ 

71-94 ♦domain leucine-rich alpha- 2 -glycoprotein repeat 

homology ♦ label LRR1\ 
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95-117 tdomain leucine-rich alpha-; 

homology tlabel LRR2\ 
118*141 tdomain leucine-rich alpha-; 

homology tlabel LRR3\ 
142-165 tdomain leucine-rich alpha-2 

homology tlabel LRR4\ 
166-189 tdomain leucine-rich alpha-2 

homology tlabel LRR5\ 
191-213 tdomain leucine-rich alpha-2 

homology tlabel LRR6\ 
214-237 tdomain leucine-rich alpha-2 

homology tlabel LRR7\ 
238-261 tdomain leucine-rich alpha-2 

homology tlabel LRR8\ 
262-285 tdomain leucine-rich alpha-2 

homology tlabel LRR9\ 
286-309 tdomain leucine-rich alpha-2 

homology tlabel LR10\ 
310-333 tdomain leucine-rich alpha-2 

homology tlabel LR11\ 

•334-357' tdomain leucine-rich alpha-2 

homology tlabel LR12\ 
358-381 tdomain leucine-rich alpha-2 

homology tlabel LR13\ 
385-408 tdomain leucine-rich alpha-2 

homology tlabel LR14\ 
409-432 tdomain leucine-rich alpha-2 

homology tlabel LR15\ 
440-485 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS1 

SUMMARY tlength 1091 tmolecular -weight 119283 tchecksum 7937 

Query Match 27 .5%; Score 267; DB 2; Length 1091; 

Best Local Similarity 35 .3%; Pred. No. 4.31e-20; 

Matches 47; Conservative 37; Mismatches 47; Indels 2; Gaps 2; 

Db 166 RIRELNLASNRISILESGAFDGLSRSLLTLRLSKNRITQLPVKAF-KLPRLTQLDLNRNR 224 

:M I I Nil : III: I : | |||::| ; ;| I :| :|||: |: 
Qy 1 HLRVLQIMENRI ST I ERGAFQDL - KELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQ 59 

Db 225 IRLIEGLTFQGLDSLEVLRLQRNNISRLTDGAFWGLSKMHVLHLEYNSLVEVNSGSLYGL 284 

h I :|:| : |:|: I II : 1 1 1 1 : : I ; || |: |:: :: ;|: : 
Qy 60 IQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 

Db 285 TALHQLHLSNNSI 297 

I: ::| :|:: 
Qy 120 PKLRTFRLHSNNL 132 



2 -glycoprotein repeat 
2 -glycoprotein repeat 
2 -glycoprotein repeat 
2 -glycoprotein repeat 
-glycoprotein repeat 
2 -glycoprotein repeat 
2 -glycoprotein repeat 
2 -glycoprotein repeat 
2-glycoprotein repeat 
2 -glycoprotein repeat 
2-glycoprotein repeat 
2-glycoprotein repeat 
2-glycoprotein repeat 
2-glycoprotein repeat 



AuLT 

TITLE 



A29944 
chaoptin 



ttype complete 
precursor • fruit fly (Drosophila melanogaster) 



ALTERNATEJAMES photoreceptor cell-specific membrane protein 



ORGANISM 
DATE 



i ttext.change 



ACCESSIONS 
REFERENCE 

tauthors 
tjournal 
ttitle 



tformaljiame Drosophila melanogaster 
15-Dec-1988 tsequence revision 15-Dec-l 

24-Oct-1997 
A29944; A21123 
A29944 

Reinke, R.; Krantz, D.E.; Yen, D.; Zipursky, S.L. 
Cell (1988) 52:291-301 

Chaoptin, a cell surface glycoprotein required for Drosophila 
photoreceptor cell morphogenesis/ contains a repeat motif 
found in yeast and human, 
tcross- references MUID : 88135762 
taccession A29944 
ttmolecule.type DNA 
ttresidues 1-1134 ttlabel RE I 

ttcross -references GB:M19008; GB:M19009; GB:M19010; GB:M19011; 

GB:M19012; GB:M19013; GB:M19014; GB:M19Q15; 
GB:M19016; GB:M19017; NID:gl57094; PID:gl57098 

A21123 

tauthors Zipursky, S.L.; Venkatesh, T.R.; Teplow, D,B, ; Benzer, S. 
tjournal Cell (1984) 36:15-26 



ttitle Neuronal development in the Drosophila retina: monoclonal 

antibodies as molecular probes, 
tcross -references MUID:84106810 
taccession A21123 
ttmolecule.type protein 

ttresidues 31-43, 'HX', 46-49, 'H' Stlabel ZIP 
GENETICS 

tgene FlyBase:chp 

ttcross-references FlyBase : FBgn00003 13 
tintrons 1/3 80/3; 318/3; 377/2 422/2; 702/1 745/3; 831/2; 998/2 
CLASSIFICATION tsuperfamily chaoptin; leucine-rich alpha-2-glycoprotein 
repeat homology 





cell adhesion; glycoprotein; membrane protein 


PPATriRP 


1-29 


tdomain signal sequence ((status predicted tlabel SIG\ 


30-1134 


tproduct chaoptin tstatus predicted tlabel MAT\ 


80-102 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tstatus atypical tlabel LRR1\ 


... „, 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR2\ 


128-151 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel lrr3\ 


152-175 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR4\ 


177-200 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


201-224 


tdomain leucine-rich alpha - 2 "glycoprotein repeat 




homology tlabel LRR6\ 


226-249 


tdomain leucine-rich alpha - 2-glycoprotein repeat 




homology tlabel LRR7\ 


250-273 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LRR8\ 


279-302 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LRR9\ 


303-325 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR10\ 


326-349 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homnlnav tlahpl T,R11\ 


351-374 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR12\ 


375-399 


tdomain leucine-rich alpha-2-glycoprotein repeat 




uuiuuiuyy tittuei iiKij\ 


401-424 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR14\ 


428-451 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR15\ 


453-476 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR16\ 


477-500 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology tlabel LR17\ 


... , , 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR18\ 


527-550 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR19\ 


551-574 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology tlabel LR20\ 


577-600 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR21\ 


601-624 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR22\ 


625-648 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR23\ 


649-672 


tdomain leucine-rich alpha -2 -glycoprotein repeat 




homology tlabel LR24\ 


673-696 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology tlabel LR25\ 


708-731 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR26\ 


733-756 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR27\ 


757-780 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR28\ 


781-804 


tdomain leucine-rich alpha -2 -glycoprotein repeat 
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homology tlabel LR29\ 
805-827 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR30\ 
828-851 idomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LR31\ 
854-877 tdomain leucine-rich alpha - 2 - glycoprotein repeat 

homology tlabel LR32\ 
879-902 idomain leucine-rich alpha -2 -glycoprotein repeat 

homology tlabel LR33\ 
903-926 idomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR34\ 
928-948 idomain leucine-rich alpha -2-glycoprotein repeat 

homology tstatus atypical tlabel LR35\ 
949-972 idomain leucine-rich alpha - 2 - g ly coprotein repeat 

homology tlabel LR36\ 
973-995 idomain leucine-rich alpha-2-glycoprotein repeat 

•homology tlabel LR37\ 
996-1019 idomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR38\ 
1021-1044 idomain leucine-rich alpha - 2-glycoprotein repeat 

homology tlabel LR39\ 
1056-1080 idomain leucine-rich alpha - 2 - glycoprotein repeat 

homology tlabel LR40 
SUMMARY tlength 1134 imolecular-weight 130719 tchecksum 332 

Query Match 27.2*; Score 264; DB 1; Length 1134; 

Best Local Similarity 35.61; Pred. No. 1.10e-19; 

Matches 48; Conservative 35; Mismatches 47; Indels 5; Gaps 5; 

Db 80 KVFMLHMENTGLREIEP-YFLQSTGMYRLKISGNHLTEIPDDAFTGLERSLWELILPQND 138 

:|: I III : II I : : II::: |:| :|: I I I I I |::|: 
Qy 3 RVLQL-MENR-ISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTAR-LYRLDLSENQ 59 

Db 139 LVEIPSKSLRHWKLRHLDLGYNH1THIQHDSFRGLEDSLQTLILRENCISQLMSHSFSG 198 

: II h:l :::hl 1 1 : 1 : h Hhl I h I I :| |::| ||: 
Qy 60 IQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRD-LEVLTLNNNNITRLSVASFNH 118 



Db 



199 LIIIETLDLSGNNLF 213 

: I I: I :|||: 
119 MPKLRTFRLHSNNLY 133 



RESULT 7 
ENTRY 
TITLE 

ACCESSIONS 
REFERENCE 

Sauthors 



- human 



tjournal 
ttitle 



JE0176 ttype complete 
orphan G protein-coupled receptor precursor - 
t forma l_name Homo sapiens tcommonjame man 
03-Jul-1998 tsequence_revision 10 : Jul-1998 ttext_change 

17-Mar-1999 
JE0176 
JE0176 

McDonald, T.; Wang, R.; Bailey, W.; Xie, G.; Chen, F.; 

Caskey, C.T.; Liu, Q. 
Biochem. Biophys. Res. Commun. (1998) 247:266-270 
Identification and cloning of an orphan G protein -coupled 
receptor of the glycoprotein hormone receptor subfamily, 
tcross -references muid: 98308104 
taccession JE0176 
ttmolecule.type mRNA 
tiresidues 1-907 itlabel MCD 
I tcross -references GB:AF062006 
COMMENT This protein is a receptor for a novel class of glycoprotein 
ligands. 

GENETICS 

fgene HG38 

tmap_position 12q22-23 
FEATURE 

1-21 idomain signal sequence Istatus predicted tlabel SIG\ 

562-583 idomain transmembrane tstatus predicted t'. 

594-616 idomain transmembrane tstatus predicted t! 

639-660 idomain transmembrane tstatus predicted f. 

681-701 idomain transmembrane tstatus predicted t'. 

725-744 idomain transmembrane tstatus predicted t; 

768-791 idomain transmembrane tstatus predicted f. 



.abel 


TM1\ 


.abe] 


TM2\ 


abe] 


TM3\ 


.abe] 


TM4\ 


.abe] 


TM5\ 


label 


TM6\ 



803-824 Idomain transmembrane tstatus predicted tlabel TM7 

SUMMARY ilength 907 imolecular-weight 99997 tchecksum 8790 

Query Match 26.9%; Score 261; DB 2; Length 907; 

Best Local Similarity 36.6*; Pred, No. 2.78e-19; 

Matches 48; Conservative 24; Mismatches 59; Indels 0; Gaps 0 

Db 116 LKVLMLQNNQLRHVPTEALQNLRSLQSLRLDANHISYVPPSCFSGLHSLRHLWLDDNALT 175 

Ml I :h: : |:|:|: |: III: |:: I II I :| I :| : 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 176 EIPVQAFRSLSALQAMTLALNKIHHIPDYAFGNLSSLWLHLHNNRIHSLGKKCFDGLHS 235 

II llh I I I I I II I I II Ml I |: |: : 

Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 



Db 



236 LETLDLNYNNL 246 

I I: I: III 
122 LRTFRLHSNNL 132 



RESULT 8 
ENTRY 
TITLE 
ORGANISM 
DATE 



■ human 



ACCESSIONS 



tauthors 

tjournal 
ttitle 



A34901 ttype complete 
lysine carboxypeptidase (EC 3.4.17.3) 83K chain ■ 
tformaljiame Homo sapiens tcommonjiame man 
20-M-1990 isequence_revision 20-Jul-1990 ftext.change 

24-Sep-1998 
A34901 
A34901 

Tan, F.; Weerasinghe, D.K.; Skidgel, R.A.; Tamei, H.; Kaul, 

R.K.; Roninson, I.B.; Schilling, J.W.; Erdoes, E.G. 
J. Biol. Chem. (1990) 265:13-19 

The deduced protein sequence of the human carboxypeptidase N 
high molecular weight subunit reveals the presence of 
leucine-rich tandem repeats, 
tcross-references MUID:90094386 
taccession A34901 
ttstatus preliminary 
ttmolecule.type mRNA 
tiresidues 1-536 itlabel TAN 
ttcross -references GB: J05158; NID:gl79935; PID:gl79936 
GENETICS 

tgene GDB:ACBP 

ttcross -references GDB: 127893 
tmap_position 6q25.3-6q26 
CLASSIFICATION isuperfamily leucine-rich alpha- 2 -glycoprotein repeat 
homology 

hydrolase; metallo-carboxypeptidase 



KEYWORDS 
FEATURE 



77-100 


101 


124 


125 


148 


149 


172 


173 


196 


197 


220 


221 


244 


245 


268 


269 


292 


293 


316 


317 


340 


341 


364 


SUMMARY 



fdomain leucine-rich alpha- 2 

homology tlabel LRR1\ 
idomain leucine-rich alpha-; 

homology tlabel LRR2\ 
idomain leucine-rich alpha- 2 

homology tlabel LRR3\ 
idomain leucine-rich alpha-2 

homology tlabel LRR4\ 
idomain leucine-rich alpha-2 

homology tlabel LRR5\ 
idomain leucine-rich alph 

homology tlabel LRR6\ 
tdomain leucine-rich alpha-2 

homology tlabel LRR7\ 
fdomain leucine-rich alpha-2 

homology tlabel LRR8\ 
fdomain leucine-rich alpha-2 

homology tlabel LRR9\ 
tdomain leucine-rich alpha-2 

homology tlabel LR10\ 
tdomain leucine-rich alpha- \ 

homology tlabel LR11\ 
fdomain leucine-rich alpha-2 

homology tlabel LR12 
ilength 536 f molecular -weight 58649 tchecksum 8569 



2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 


2-glycoprotein 


repeat 
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Query Match 26.8%; Score 260; DB 2; Length 536; 

Best Local Similarity 32.6*; Pred. No. 3.79e-19; 

Matches 43; Conservative 31; Mismatches 58; Indels 0; Gaps 0; 

Db 77 RLEDLEVTGSSFLNLSTNIFSNLTSLGKLTLNFNMLEALPEGLFQHLAALESLHLQGNQL 136 

:| I:: : : : hi I :| II I |: :|| II I I I I II: 
Qy 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 137 QALPRRLFQPLTHLKTLNLAQNLLAQLPEELFHPLTSLQTLKLSNNALSGLPQGVFGKLG 196 

Ihll: I: :| I I I :: : : |::| |: I |:|| :: |: : | : 
Qy 61 QAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMP 120 

Db 197 SLQELFLDSNNI 208 

I: : I III: 
Qy 121 KLRTFRLHSNNL 132 



RESULT 9 

ENTRY A60164 ttype complete 

•IE platelet membrane glycoprotein V precursor - human 

ANISM tformaljiame Homo sapiens ftcommonjiame man 

E 12-Jan-1993 tsequence revision 24-Feb-1994 ttext change 

17-Mar-1999 

ACCESSIONS A48030; A60164; A35483; B35483; C35483; A60432; A47507; 

S34329 
REFERENCE A48O30 

fauthors Lanza, F. ; Morales, M, ; de La Salle, C; Cazenave, J.P.; 

Clemetson, K.J.; Shimomura, T.; Phillips, D.R, 
ijournal J. Biol. Chem. (1993) 268:20801-20807 
♦title Cloning and characterization of the gene encoding the human 
platelet glycoprotein V, A member of the leucine- rich 
glycoprotein family cleaved during thrombin -induced 
platelet activation, 
♦cross-references MUID:94012616 
♦accession A48030 
fimolecule_type DNA 
tiresidues 1-560 Hlabel LA2 
iicross -references EMBL:Z23091; NID:g312501; PID:g312502 
REFERENCE A60164 

tauthors Shimomura, T. ; Fujimura, K.; Maehama, S.; Takemoto, M.; Oda, 
K.; Fujimoto, T.; Oyama, R,; Suzuki, M.; Ichihara-Tanaka, 
K. ; Titani, K. ; Kuramoto, A, 
♦journal Blood (1990) 75:2349-2356 

ititle Rapid purification and characterization of human platelet 
glycoprotein V: the amino acid sequence contains 
leucine-rich repetitive modules as in glycoprotein lb. 

tcross-references MUID: 90275263 

•taccession A60164 
♦ ♦moleculejype protein 
♦♦residues 365*384, 'X' ,386-390, 'X' ,392-395, 'X' ,397; 188-208, 'I', 210; 

27-50, T,52-53;174-180,'X',182-1B7;12M44;145-172; 
290-297, 'X', 299-311, 'X' ,313-326, 'I' -142-151, 'X' , 
153-163; 'YNTPDRXLAXYGGF';81-105, 'XX', 108, 'T';61-72, 
'TK', 75-77; 'V, 56-57; 'G', 479-487, 'X', 489-498, 'X', 500, 
'X', 502-503, 'X', 505, 'X', 507-508, 'D' ttlabel SHI 

REFERENCE A35483 

iauthors Roth, G.J.; Church, T.A.; McMullen, B.A.; Williams, S.A. 
♦journal Biochem. Biophys. Res. Commun. (1990) 170:153-161 
♦title Human platelet glycoprotein V: a surface leucine-rich 

glycoprotein related to adhesion, 
♦cross-references MUID: 90321220 
♦accession A35483 
♦♦molecule.type protein 

tiresidues 145-166, 'I', 168-169, 'X', 171-172 Itlabel ROT 
♦♦note this proteolytic fragment was designated peptide M392 

♦accession B35483 
♦imolecule.type protein 

♦♦residues 121-129, 'W , 131-135 ; 466-468 , 'X' , 470 lilabel R02 
♦♦note this material was designated peptide M393 but may 

contain two peptides 

♦accession C35483 
llmolecule.type protein 



tiresidues 252-266, 'H', 268-272, 'X', 274-279, 'I', 281-284/ 1', 286 
♦♦label R03 

♦♦note this proteolytic fragment was designated peptide M401 

A60432 

♦authors zafar, R.S.; Walz, D.A. 
tjournal Thromb. Res. (1989) 53:31-44 

♦title Platelet membrane glycoprotein V: characterization of the 

thrombin-sensitive glycoprotein from human platelets, 
tcross-references MUID: 89162331 
taccession A60432 
♦tmoleculejype protein 

♦♦residues 477-478, 'FX', 481-485, 'E', 487, 'V, 489-492, 'NQ', 495, 'E', 
497-498 ♦♦label ZAF 

REFERENCE A47507 

♦authors Hickey, M.J,; Hagen, F.S. Yagi, M. ; Roth, G.J. 
♦journal Proc. Natl. Acad. Sci. U.S.A. (1993) 90:8327-8331 
♦title Human platelet glycoprotein V: characterization of the 
polypeptide and the related Ib-v-ix receptor system of 
adhesive, leucine-rich glycoproteins, 
♦cross-references MUID: 93391348 
♦accession A47507 

♦♦status preliminary; translated from GB/EMBL/DDBJ 
♦#molecule_type mRNA 
♦♦residues 1-560 ♦♦label RES 
♦♦cross-references GB:L11238; NID:g388759; PID:g388760 
COMMENT This platelet membrane protein is a substrate for thrombin, 
COMMENT The amino end of the intact protein is blocked. 
COMMENT This protein is absent in Bernard -Soulier syndrome. 
GENETICS 
.♦gene GDB:GP5 

ticross-references GDB:230236; OMIM:173511 
tmap_position 5pter-5qter 
CLASSIFICATION ♦superfamily leucine-rich alpha-2 -glycoprotein repeat 
homology 

KEYWORDS blocked amino end; glycoprotein; platelet; tandem repeat; 

transmembrane protein 
SUMMARY length 560 tmolecular-weight 60958 ♦checksum 7673 

Query Match 26.6%; Score 258; DB 2; Length 560; 

Best Local Similarity 32.8*; Pred. No. 7.04e-19; 

Matches 43; Conservative 37; Mismatches 51; indels 0; Gaps 0; 

Db 76 LQRLMISDSHISAVAPGTFSDLIKLKTLRLSRNKITHLPGALLDKMVLLEQLFLDHNALR 135 

I: I : :::lh: hi II I llhlh: :| h Mil :: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 136 GIDQNMFQKLVNLQELALNQNQLDFLPASLFTNLENLKLLDLSGNNLTHLPKGLLGAQAR 195 

:| =: h I" =1 h lh : : I I :| :| h Ihhh : : :| 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 196 LERLLLHSNRL 206 

I : llll I 
Qy 122 LRTFRLHSNNL 132 



10 



ENTRY 
TITLE 



A41915 ttype complete 

insulin-like growth factor-binding complex acid-labile chain 
precursor - human 
ALTERNATEJAMES Acid-Labile Subunit (ALS) 
ORGANISM ♦formal.name Homo sapiens ♦commonjiame man 

DATE 31-Dec-1993 ♦sequence_revision 31-Dec-1993 ♦text.change 

20-Mar-1998 
ACCESSIONS A41915 
REFERENCE A41915 

♦authors Leong, S.R.; Baxter, R.C.; Camerato, T. ; Dai, J.; Wood, W.I. 

♦journal Mol, Endocrinol. (1992) 6:870-876 

♦title Structure and functional expression of the acid -labile 

subunit of the insulin-like growth factor-binding protein 
complex. 

♦cross-references MUID:92357025 
♦accession A41915 

♦♦status preliminary 



Tue Jun 1 10:16:01 1999 



US-09-191-647-12.rpr 



Page 7 



ttmoleculejype mRNA; protein 
tfresidues 1-605 Mabel LEO 
ficross-references GB:M86826; NID:gl84807; PID; 
t»experimental_source liver 



llnote 

CLASSIFICATION 



sequence extracted from NCBI backbone (NCBIP: 110171) 



fsuperfamily leucine-rich alpha-2- 
homology 



75- 


8 


fdomain leucine-rich alpha-2 
homology flabel LRRl\ 


-glycoprotein 


repeat 


99-122 


fdomain leucine-rich alpha-2 


-glycoprotein 


repeat 






homology tlabel LRR2\ 






123 


146 


tdomain leucine-rich alpha-2 
homology ilabel LRR3\ 


-glycoprotein 


repeat 


147 


170 


tdomain leucine-rich alpha-2 
homology ilabel LRR4\ 


■glycoprotein 


repeat 


171 


194 


tdomain leucine-rich alpha-2 
homology flabel LRR5\ 


-glycoprotein 


repeat 


195 


218 


tdomain leucine-rich alpha-2 
homology tlabel LRR6\ 


-glycoprotein 


repeat 


219 


242 


tdomain leucine-rich alpha-2 
homology tlabel LRR7\ 


-glycoprotein 


repeat 


243 


266 


tdomain leucine-rich alpha-2 
homology tlabel LRR8\ 


•glycoprotein 


repeat 


267 


290 


tdomain leucine-rich alpha-2 
homology tlabel LRR9\ 


■glycoprotein 


repeat 


291 


314 


tdomain leucine-rich alpha-2 
homology tlabel LR10\ 


-glycoprotein 


repeat 


315 


338 


tdomain leucine-rich alpha-2 
homology tlabel LRll\ 


-glycoprotein 


repeat 


339 


362 


fdomain leucine-rich alpha-2 
homology tlabel LR12\ 


-glycoprotein 


repeat 


363 


386 


tdomain leucine-rich alpha-2 
homology tlabel LR13\ 


•glycoprotein 


repeat 


387 


410 


idomain leucine-rich alpha-2 
homology tlabel LR14\ 


■glycoprotein 


repeat 


411 


434 


fdomain leucine-rich alpha-2 


•glycoprotein 


repeat 






homology tlabel LR15\ 




435 


458 


fdomain leucine-rich alpha-2 
homology tlabel LR16\ 


•glycoprotein 


repeat 


459 


482 


fdomain leucine-rich alpha-2 
homology tlabel LR17\ 


-glycoprotein 


repeat 


483 


506 


tdomain leucine-rich alpha-2 


-glycoprotein 


repeat 






homology flabel LR18\ 




507 


529 


tdomain leucine-rich alpha-2 
homology tlabel LR19 


•glycoprotein 


repeat 



glycoprotein repeat 



pOARY tlength 605 tmolecular -weight 66034 fchecksum 1870 

Query Match 26,0*; Score 252; DB 2; Length 605; 

Best Local Similarity 32.8*; Pred. No. 4,48e-18; 

Conservative 30; Mismatches 58; Indels 0; Gaps 



Matches 


Db 


220 


Qy 


2 


Db 


280 


Qy 


62 


Db 


340 


Qy 


122 



I! hi I : :| I :| |::| |:|| 



:IH : 



GILEDTFPGLLGLRVLRLSHNAIASLRPRTFKDtHFLEELQLGHNRIRQLAERSFEGLGQ 339 

:l I : :: hi I h : :|: I: II I I :l I :|: lh : 



RESULT 11 

ENTRY JC5239 ttype complete 

TITLE insulin-like growth factor acid-labile chain - baboon 

ORGANISM tformaljiame Papio sp. tcommonjiame baboon 

DATE 17-Apr-1997 tsequence.revision 09-May-1997 ftext_change 

09-May-1997 
ACCESSIONS JC5239 
JC5239 



♦authors Delhanty, P.; Baxter, R.C. 
tjournal Biochem. Biophys. Res. Commun, (1996) 227:897-902 
ttitle The cloning and expression of the baboon acid-labile subunit 
of the insulin-like growth factor binding protein complex, 
f cross -references MOID; 97040714 
tcontents liver 
taccession JC5239 
ttmoleculejype mRNA 
ttresidues 1-605 filabel DEL 
COMMENT This factor is structurally related to proinsulin and have 

insuline-like metabolic, differentiative, and cell proliferative 
activities . 

SUMMARY tlength 605 tmolecular-weight 66110 tchecksum 1703 

Query Match 25,5%; Score 248; DB 2; Length 605; 

Best Local Similarity 32,8%; Pred. No. 1.53e-17; 

Matches 43; Conservative 29; Mismatches 59; Indels 0; Gaps 0; 

Db 220 LRELDLSRNALRAIKANVFAQLPRLQKLYLDRNLIAAVAPGAFLGLKALRWLDLSHNRVA 279 

II hi I : :l I :| h:| Ml : : III I :|||| |:: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 280 GLLEDTFPGLLGLRVLRLSHNAIASLRPRTFEDLHFLEELQLGHNRIRQLAERSFEGLGQ 339 

:: :| I : :: hi I h : :| h II I I :| I :h lh : 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 340 LEVLTLDHNQL 350 

I : I I I 
Qy 122 LRTFRLHSNNL 132 



RESULT 12 

ENTRY JC1282 ftype complete 

TITLE insulin-like growth factor-binding protein acid labile chain 

precursor - rat 

ORGANISM tformaljiame Rattus norvegicus tcommonjiame Norway rat 

DATE 30-Sep-1993 tsequencejrevision 30-Sep-1993 ttext change 

15-Aug-1997 
ACCESSIONS JC1282 
REFERENCE JC1282 

fauthors Dai, J.; Baxter, R.C. 

tjournal Biochem. Biophys. Res. Commun. (1992) 188:304-309 

ttitle Molecular cloning of the acid-labile subunit of the rat 

insulin-like growth factor binding protein complex, 
tcross -references MUID:93038676 
taccession JC1282 
ftmoleculejype mRNA 
ttresidues 1-603 ttlabel DAI 
ttexperimental_source liver 

ftnote the authors translated the codon AAG for residue 63 as 

Arg, AAA for residue 205 as Pro and GGT for residue 
260 as Arg 

CLASSIFICATION fsuperfamily leucine-rich alpha-2-glycoprotein repeat 
homology 

FEATURE 

1-27 tdomain signal sequence fstatus predicted ilabel SIG\ 

28-603 f product insulin-like growth factor binding protein, 

acid labile chain tstatus predicted tlabel MAT 
SUMMARY tlength 603 f molecular -weight 66811 tchecksum 8075 

Query Match 24.9%; Score 242; DB 2; Length 603; 

Best Local Similarity 34,1%; Pred. No. 9,62e-17; 

Matches 46; Conservative 34; Mismatches 52; Indels 3; Gaps 3; 

Db 240 HLPRLQKLYLDRNLITAVAPGAFLGMKALRWLDLSHNRVAGLMEDTFPGLLGLHVLRLAH 299 

II 'II:: I::: III :| I :| |::| : : II I II h 
Qy 1 HLRVLQ-L-ME-NRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSE 57 

Db 300 NAIASLRPRTFKDLHFLEELQLGHNRIRQLGERTFEGLGQLEVLTLNDNQITEVRVGAFS 359 

I I :: ::h : :||| hi : : :| :| :||IMIhl II : |::|: 
Qy 58 NQIQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFN 117 

Db 360 GLFNVAVMNLSGNCL 374 
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: I :| I 
118 HMPKLRTFRLHSNNL 132 



RESULT 13 

ENTRY JC6128 ftype complete 

TITLE insulin -like growth factor binding complex acid labile chain 

■ mouse 

ORGANISM Iformaljiame Mus musculus tcommonjiame house mouse 

DATE 23-Mar-1997 tsequence.revision 09-May-1997 ttext change 

10-Sep-1997 
ACCESSIONS JC6128 
REFERENCE JC6128 

fauthors Boisclair, Y.R.; Seto, D,; Hsieh, s.; Hurst, K.R.; Ooi, 6.T. 
ijournal Proc. Natl. Acad. Sci. U.S.A. (1996) 93:10028-10033 
ititle Organization and chromosomal localization of the gene 

encoding the mouse acid labile subunit of the insulin-like 
growth factor binding complex. 
Across -references MUID: 96413591 
ftaccession JC6128 

•ttmolecule.type DNA 
^residues 1-603 Mabel BOI 
ticross-references GB:U66900; NID:gl621612; PID:gl621613 
COMMENT This protein is a serum protein and it is of the ternary complex in 
the physiology of circulating insulin-like growth factor. 

GENETICS 

tgene als 
tap_position 17 

SUMMARY ilength 603 tmolecular -weight 66959 fchecksum 7670 

Query Match 24.8%; Score 241; DB 2; Length 603; 

Best Local Similarity 30.5%; Pred. No. 1.31e-16; 

Matches 40; Conservative 32; Mismatches 59; Indels 0; Gaps 0; 

Db 220 LRELDLSRNALRSVKANVFIHLPRLQKLYLDRNLITAVAPRAFLGMKALRWLDLSHNRVA 279 

M :!!::: II |::| |:|| : : ||| | :|||| :: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 280 GLLEDTFPGLLGLHVLRLAHNAITSLRPRTFKDLHFLEELOLGHNRIRQLGEKTFEGLGQ 339 

:: :| I : : hi I |: : :|: |: II I I :| | :|: :|: : 
Oy 62 AIPRKAFRGAVDI KNLQLDY NQ I SC I EDGAFRALRDLEVLT LNNNNI TRLSVASPNHMPK 121 

Db 340 LEVLTLNDNQI 350 

I : h I : 
Oy 122 LRTFRLHSNNL 132 



RESULT 14 

#RY S40241 itype complete 

LE G protein -coupled receptor • great pond snail 

ANISM Iformaljiame Lymnaea stagnalis tcommonjiame great pond snail 

DATE 06-Jan-1995 tsequencejrevision 06-Jan-1995 ttext change 

09-Sep-1997 
ACCESSIONS S40241 
REFERENCE S40241 

tauthors Tensen, CP.; Kesteren, E.R.; Planta, R.J.; Cox, K.; Burke, 

J.F.; Heerikhuizen, H.; Vreugdenhil, E. 
♦submission submitted to the EMBL Data Library, June 1993 
♦description A G protein -coupled receptor with LDL-binding motifs suggests 

a role for lipoproteins in G- linked signal transduction, 
♦accession S40241 

Ifstatus preliminary . 
ttmolecule_type mRNA 
ttresidues 1-1115 Mabel TEN 
ttcross-references EMBL;Z23104; NID:g438128; PID:g438129 
CLASSIFICATION tsuperfamily LDL receptor ligand-binding repeat homology; 

leucine-rich alpha-2-glycoprotein repeat homology 
KEYWORDS G protein-coupled receptor; transmembrane protein 

FEATURE 

38-77 tdomain LDL receptor ligand-binding repeat homology 

tlabel LDL1\ 

79-113 f domain LDL receptor ligand-binding repeat homology 

tlabel LDL2\ 



binding 


repeat homology 


binding 


repeat homology 


binding 


repeat homology 


binding 


repeat homology 


binding 


repeat homology 


binding 


repeat homology 



158-194 tdomain LDL receptor ligand-b 

tlabel LDL3\ 

233-267 , tdomain LDL receptor ligand-b 

tlabel LDL4\ 

322-361 tdomain LDL receptor ligand-t 

tlabel LDL5\ 

367-401 tdomain LDL receptor ligand-b 

tlabel LDL6\ 

446-483 tdomain LDL receptor ligand-b 

tlabel LDL7\ 

488-523 (tdomain LDL receptor ligand-b 

tlabel LDL8\ 

584-607 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR1\ 
608-631 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR2\ 
632-655 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
656-679 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR4\ 
704-727 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
774-797 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR6 
SUMMARY ilength 1115 Molecular -weight 125864 Checksum 74 

Query Match 24, 41; Score 237; DB 2; Length 1115; 

Best Local Similarity 28.6%; Pred. No. 4.41e-16; 

Matches 38; Conservative 40; Mismatches 55; Indels 0; Gaps C 

Db 609 LTHLNLADNNITSLKNGSLLGLSMQLHINGNKIETIEEDTFSSMIHLTVLDLSNQRLT 668 

MM I::: |:: I :| :|::| |::: : | | : :| 1 1 1 1 : :: 
Qy 2 LRVLQLMENRI ST I ERGAFQDLKELERLRLNRNNLQLFPELLFLGT ARLYRLDLS ENQIQ 61 

Db 669 HVYKNMFKGLKQITVLNISRNQINSIDNGAFNNLANVRLIDLSGNVIKDIGQKVFMGLPR 728 

: ■■ hi :l I : III: |::||| I :: :: h I I :: I :|: 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 729 LVELKTDSYRFCC 741 

I :: I : I 
Qy 122 LRTFRLHSNNLYC 134 



RESULT 15 

ENTRY NBHUA2 ftype complete 

TITLE leucine-rich alpha-2-glycoprotein - human 

ORGANISM tformaljiame Homo sapiens tcommon_name man 

DATE 27-Nov-1985 tsequence_revision 27-nov-1985 ttext change 

05-Dec-1998 

ACCESSIONS A03211 

REFERENCE A03211 

fauthors Takahashi, N.; Takahashi, y.; Putnam, F.w. 
tjournal Proc. Natl. Acad. Sci. U.S.A. (1985) 82:1906-1910 
ttitle Periodicity of leucine and tandem repetition of a 24 -amino 
acid segment in the primary structure of leucine-rich 
alpha-2-glycoprotein of human serum, 
tcross-references MUID:85166241 
faccession A03211 
##molecule_type protein 
ttresidues 1-312 Mabel TAK 
COMMENT The function of this plasma protein is not known . 
CLASSIFICATION tsuperfamily leucine-rich alpha-2-glycoprotein; leucine-rich 
alpha -2 -glycoprotein repeat homology; proteoglycan 
carboxyl -terminal homology 
KEYWORDS duplication; glycoprotein; plasma; tandem repeat 

FEATURE 

58-81 ttdomain leucine-rich alpha-2 -glycoprotein repeat 

homology tlabel LRR1\ 
.82-105 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR2\ 
106-129 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR3\ 
130-153 tdomain leucine-rich alpha-2-glycoprotein repeat 
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homology t label LRR4\ 
154-177 fdomain leucine-rich alpha-2-glycoprotein repeat 

homology t label LRR5\ 
178-201 tdomain leucine-rich alpha - 2 - glycoprotein repeat 

homology t label LRR6\ 
202-225 fdomain leucine-rich alpha- 2 - glycoprotein repeat 

homology t label LRR7\ 
226-249 Udomain leucine-rich alpha-2-glycoprotein repeat 

homology i label lrr8\ 
262-309 tdomain proteoglycan carboxyl -terminal homology ilabel 

PCH\ 

2 *binding_site carbohydrate (Thr) (covalent) Ustatus 

experimental 

8-21,268-294 tdisulfide.bonds (tstatus experimental 
44,151,234,290 ibinding.site carbohydrate (Asn) (covalent) tstatus 
experimental\ 
fbinding.site carbohydrate (Asn) (covalent) tstatus 
absent 

tlength 312 tolecular -weight 34346 tchecksum 6045 

Query Match 24,21; Score 235; DB 1; Length 312; 

Best Local Similarity 31.8%; Pred. No. 8. 10e-16; 

42; Conservative 39; Mismatches 49; Indels 2; Gaps 2; 



Matches 


Db 


107 


Qy 


2 


Db 


166 


Qy 


61 


Db 


226 


Qy 


121 



I I 1:11:: :| 



I! I :| |: I I: :| I |: I | |||:|||: 



1:1: I : 



Search completed: Fri May 28 09:30:18 1999 
Job time : 21 sees, 



f 
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Release 3 . 1A John F. Collins, Biocomputing Research Unit, 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

^^rch_pp protein - protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:30:36 1999; MasPar time 5.87 Seconds 

645,690 Million cell updates/sec 

Tabular output not generated, 



Title: 

Description: 
Perfect Score: 



Scoring table: 



XJS-09-191-647-12 

(1-134) from US09191647 .pep 

971 

1 HLRVLQLMENRISTIERGAF SFNHMPKLRTFRLHSNNLYC 134 

PAM 150 
Gap 11 

77977 seqs, 28268293 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 

Database: swiss-prot37 
l:swissprot 

Statistics; Mean 46.321; Variance 107.872; scale 0,429 

Pred, No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



SUMMARIES 



No. 


Score 


Match Length 


DB ID 


Description 


Pred. No. 


1 


366 


37,7 


1480 


1 SLIT.DROME 


SLIT PROTEIN PRECURSOR 


1.73e 


39 


2 


298 


30.7 


682 


1 CONNJROME 


CONNECTIN PRECURSOR, 


1.77e 


28 


3 


267 


27.5 


567 


1 GPV_RAT 


PLATELET GLYCOPROTEIN 


1.37e 


23 


4 


264 


27.2 


1134 


1 CHAOJRQME 


CHAOPTIN PRECURSOR (PH 


4.01e 


23 


5 


260 


26.8 


536 


1 CBP8JUMAN 


CARBOXYPEPTIDASE N 83 


1.68e 


22 


6 


259 


26.7 


567 


1 GPVJOUSE 


PLATELET GLYCOPROTEIN 


2.40e 


22 


7 


258 


26.6 


560 


1 GPVJUMAN 


PLATELET GLYCOPROTEIN 


3.43e 


22 


8 


252 


26.0 


605 


1 ALSJUMAN 


INSULIN-LIKE GROWTH FA 


2.91e 


21 


9 


248 


25.5 


605 


1 ALS_PAPPA 


INSULIN- LIKE GROWTH FA 


1.20e 


20 


10 


242 


24.9 


603 


1 ALSJAT 


INSULIN-LIKE GROWTH FA 


l.OOe 


19 


11 


241 


24.8 


603 


1 ALS_MOUSE 


INSULIN-LIKE GROWTH FA 


1.42e 


19 


12 


237 


24.4 


1115 


1 GPCR_LYMST 


G-PROTEIN COUPLED RECE 


5,79e 


19 


13 


235 


24.2 


312 


1 A2GLJUMAN 


LEUCINE -RICH ALPHA- 2 -G 


1.17e 


18 


14 


222 


22.9 


361 


1 CHADJOVIN 


CHONDROADHERIN PRECURS 


1.07e 


16 


15 


222 


22.9 


662 


1 GARP.HUMAN 


GARP PROTEIN PRECURSOR 


1.07e 


16 


16 


213 


21.9 


368 


1 PGS1JUMAN 


BONE/CARTILAGE PROTEOG 


2.36e 


15 


17 


213 


21.9 


369 


1 PGSlJOUSE 


BONE/CARTILAGE PROTEOG 


2.36e 


15 


18 


213 


21.9 


369 


1 PGS1JAT 


BONE/CARTILAGE PROTEOG 


2,36e 


15 


19 


210 


21.6 


369 


1 PGSl.CANFA 


BONE/CARTILAGE PROTEOG 


6,58e 


15 


20 


210 


21.6 


369 


1 PGS1JOVIN 


BONE/CARTILAGE PROTEOG 


6,58e 


15 


21 


208 


21,4 


1097 


1 TOLL.DROME 


TOLL PROTEIN PRECURSOR 


1.30e 


14 


22 


205 


21,1 


357 


1 PGS2.CHICK 


BONE PROTEOGLYCAN II P 


3.60e 


14 


23 


202 


20.8 


360 


1 PGS2J30VIN 


BONE PROTEOGLYCAN II P 


9.91e 


14 



24 


196 20 


.2 359 


L PGS2.HUMAN 


BONE PROTEOGLYCAN II P 


7,43e-13 


25 


196 20 


.2 360 


PGS2.CANFA 


BONE PROTEOGLYCAN II P 


7.43e-13 


26 


190 19 


.6 354 


I PGS2J10USE 


BONE PROTEOGLYCAN II P 


5,48e-12 


27 


189 19 


.5 382 


L PARGJUMAN 


PROLARGIN PRECURSOR (P 


7.64e-12 


28 


188 19 


.4 925 


GLHRJWTEL 


PROBABLE GLYCOPROTEIN 


1.06e-ll 


29 


182 18 


.7 1257 


L FLIH.CAEEL 


FLIGHTLESS- I PROTEIN H 


7.65e-ll 


30 


180 18 


.5 354 


PGS2_RAT 


BONE PROTEOGLYCAN II P 


1.47e-10 


31 


180 18 


.5 376 


FMOD_MOUSE 


FIBROMODULIN PRECURSOR 


1.47e-10 


32 


180 18 


.5 376 


FMODJAT 


FIBROMODULIN PRECURSOR 


1.47e-10 


33 


180 18 


.5 626 


GPBAJUMAN 


PLATELET GLYCOPROTEIN 


1.47e-10 


34 


179 18 


.4 277 


RSU1J10USE 


RAS SUPPRESSOR PROTEIN 


2.04e-10 


35 


178 18 


,3 277 


RSU1JUMAN 


RAS SUPPRESSOR PROTEIN 


2.82e-10 


36 


175 18 


,0 1839 


cyaaIsackl 


ADENYLATE CYCLASE (EC 


7.46e-10 


37 


171 17 


.6 376 


FMODJUMAN 


FIBROMODULIN PRECURSOR 


2.70e-09 


38 


171 17 


.6 2145 


CYAAJODAN 


ADENYLATE CYCLASE (EC 


2.70e-09 


39 


165 17 


.0 375 


FMODJOVIN 


FIBROMODULIN PRECURSOR 


1.83e-08 


40 


164 16 


.9 701 


LSHRJOVIN 


LUTROPIN-CHORIOGONADOT 


2.52e-08 


41 


162 16 


.7 676 


LSHR.CALJA 


LUTROPIN-CHORIOGONADOT 


4.73e-08 


42 


162 16 


.7 764 


TSHRJHEEP 


THYROTROPIN RECEPTOR P 


4.73e-08 


43 


160 16 


.5 763 


TSHRJ30VIN 


THYROTROPIN RECEPTOR P 


8.87e-08 


44 


159 16 


.4 2026 


CYAA.YEAST 


ADENYLATE CYCLASE (EC 


1.21e-07 


45 


157 16 


.2 1692 


CYAA_SCHPO 


ADENYLATE CYCLASE (EC 


2.27e-07 



,T 1 

SLIT.DROME STANDARD; PRT; 1480 AA. 
P24014; 

01-MAR-1992 (REL. 21, CREATED) 
01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 
01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 
SLIT PROTEIN PRECURSOR. 
SLI. 

DROSOPHILA MELANOGASTER (FRUIT FLY). 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 91099665. 

ROTHBERG J.M., JACOBS J.R., GOODMAN C.S., ARTAVANIS-TSAKONAS S.; 
"Slit: an extracellular protein necessary for development of midline 
glia and commissural axon pathways contains both EGF and LRR 



GENES DEV. 4:2169-2187(1990). 

-I- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

MATRIX MOLECULES. 
•!- TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 

EVENTUALLY DISTRIBUTED ALONG THE AXONS. 
•I- ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

BY 11 AA AT THE C-TERMINUS OF THE LAST EGF REPEAT. 
-!- SIMILARITY: CONTAINS 7 EGF-LIKE DOMAINS, 
-!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

MANY PROTEINS. NUMBER IN THIS PROTEIN: 22. TWO BLOCK OF 6 LRR'S 

AND TWO BLOCKS OF 5 LRR'S. 
■!- SIMILARITY: CONTAINS A C -TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

This SWISS-PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the embl outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseGisb-sib.ch), 

EMBL; X53959; G8615; -. 
PIR; A36665; A36665. 
FLYBASE; FBgn0003425; sli. 
PROSITE; PS00010; ASXJYDROXYL; 3. 
PROSITE; PS00022; EGF J; 7. 
PROSITE; PS01185; CTCK.l; 1. 
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DR PROSIIE; PS01186; EGF.2; 5. 

DR PROSITE; PS01187; EGF_CA; 2. 

DR PROSITE; PS01225; CTCK 2; 1, 

DR PFAM; PF00007; Cys.knot; 1. 

DR PFAM; PF00008; EGF; 7. 

DR PFAM; PF00054; laminin G; 1. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUCI NE - REPEAT ; DUPLICATION. 



FT 


SIGNAL 


1 


36 








37 


1480 


SLIT PROTEIN. 


PT 


DUMA1N 


70 


104 


CONSERVED N-FLANKING REGION OF THE LRR. 


pm 

J 


UUMA1N 


105 


230 


LEUCINE-RICH REPEATS (1ST REGION) . 


1 


DOMAIN 


231 


294 


CONSERVED C -FLANKING REGION OF THE LRR, 






™? 




CONSERVED N*FLANKING REGION OF THE LRR. 


FT 


DOMAIN 


327 


452 


LEUCINE-RICH REPEATS (2ND REGION) , 


FT 


DOMAIN 


453 


518 


CONSERVED C -FLANKING REGION OF THE LRR. 


pi 


DOMAIN 


519 


550 


CONSERVED N-FLANKING REGION OF THE LRR, 


FT 


DOMAIN 


551 


653 


LEUCINE-RICH REPEATS (3RD REGION). 


FT 


DOMAIN 


654 


714 


CONSERVED C-FLANKING REGION OF THE LRR, 


1 


UUMiUW 






CONSERVED N-FLANKING REGION OF THE LRR, 


W 


TYlMATN 


747 


848 


T PrTPTWP-TDTPU DPDP1VPO / yl mtl OP^TAVTV 

LfcUUNfci Kiln KrjPbATs (4TH REGION), 




TVM/flTU 
UUMA1N 






CONSERVED C-FLANKING REGION OF THE LRR. 


FT 


RPPFAT 


105 


115 






KtiFfcAl 


116 


139 


LRR 1*2. 


PT 


DPDPAT 
ftLrLnl 






tod 




REPEAT 


164 


1«7 


TDD 1^' 


FT 


RPPPAT 
(\LrLnl 


188 


211 




FT 


REPEAT 


212 


230 


TRR 

. !?' 


FT 


REPEAT 


327 


337 






REPEAT 


338 


361 


TRR 00 


PT 


REPEAT 


362 


385 


LRR 2"3. 


PT 


DPDP1LT 

KLrLftl 




409 


LRR 2*4, 


PT 


DPDPAT 
KLrLftl 








PT 


RPPFAT 


434 


452 


TRR 7-fi' 


PT 


RPPPAT 
KLrLftl 


551 


562 


TDD 11 


FT 


RPPPAT 
ftLrLnl 


563 


586 


TRR ;o 


PT 


DVDPAT 
KLrLftl 


587 




TOO ' ' 


PT 


RPPPAT 
KLrLftl 


611 


634 


TOO 1 J 




RPPPAT 
RLrLnl 


635 


653 


TDD \ \' 




RPPPAT 
rvLrLftl 


747 




TRR i-1 

TDD \ ■> 




DPDPAT 
KLPLftl 




791 


TDD ! \ 


PT 


RVDPAT 
KLrLftl 


7f» 








REPEAT 


806 


829 


TRR 


FT 


RPPPAT 
KLrLftl 


830 


848 




PT 








EGF-LIKE 1. 


FT 


DOMAIN 


946 


983 


Lor Hist i. 


FT 


DOMAIN 


985 


1022 


Pfip-TTifP "X PArrTrTM-p.TunTMr 1 /DrYrpuriTiM 

Lor Llnd J, LnLLlUPl rUHUlNo [ rViLHilAbJ 




DOMAIN 


1024 


1062 


LOr IjlnL 4 • 


V 


DOMAIN 


1064 


1100 


Pf.P-TTIfP "i PATrTTTW-RTMriTMr 1 1 PfiTPNTTAT \ 




DOMAIN 


1111 


1149 


Lur Lill\L O. 


FT 


DOMAIN 


1353 


1392 


EGF-LIKE 7. 


FT 


DOMAIN 


1409 


1480 


CTCK, 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM). 


FT 


CARBOHYD 


111 


111 


POTENTIAL. 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


FT 


CARBOHYD 


435 


435 


POTENTIAL. 


FT 


CARBOHYD 


783 


783 


POTENTIAL. 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


FT 


CARBOHYD 


998 


998 


POTENTIAL. 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL. 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL. 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL. 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISULFID 


916 


932 


BY SIMILARITY, 


FT 


DISULFID 


934 


943 


BY SIMILARITY, 


FT 


DISULFID 


950 


961 


BY SIMILARITY. 


FT 


DISULFID 


955 


971 


BY SIMILARITY, 



FT. 


DISULFID 


973 


982 


■BY 


SIMILARITY. 


FT 


DISULFID 


989 


1001 


BY 


SIMILARITY. 


FT 


DISULFID 


995 


1010 


BY 


SIMILARITY. 


FT 


DISULFID 


1012 


1021 


BY 


SIMILARITY. 


FT 


DISULFID 


1028 


1041 


BY 


SIMILARITY. 


FT 


DISULFID 


1035 


1050 


BY 


SIMILARITY, 


FT 


DISULFID 


1052 


1061 


BY 


SIMILARITY . 


FT 


DISULFID 


1068 


1079 


BY 


SIMILARITY , 


FT 


DISULFID 


1073 


1088 


BY 


SIMILARITY , 


FT 


DISULFID 


1090 


1099 


BY 


SIMILARITY. 


FT 


DISULFID 


1115 


1125 


BY 


SIMILARITY, 


FT 


DISULFID 


1120 


1137 


BY 


SIMILARITY. 


FT 


DISULFID 


1139 


1148 


BY 


SIMILARITY. 


FT 


DISULFID 


1357 


1368 


BY 


SIMILARITY. 


FT 


DISULFID 


1362 


1380 


BY 


SIMILARITY. 


FT 


DISULFID 


1382 


1391 


BY 


SIMILARITY. 


FT 


DISULFID 


1409 


1443 


BY 


SIMILARITY , 


FT 


DISULFID 


1423 


1457 


BY 


SIMILARITY. 


FT 


DISULFID 


1434 


1473 


BY 


SIMILARITY. 


FT 


DISULFID 


1438 


1475 


BY 


SIMILARITY. 


FT 


DISULFID 


1442 


1479 


BY 


SIMILARITY. 


SQ 


SEQUENCE 


1480 AA; 165752 MW; 


2CD1C421 CRC32; 


Query Match 




37.7%; 


Score 366; DB 1; L 



Best Local Similarity 40.0%; Pred. No. 1.73e-39; 
Matches 52; Conservative 31; Mismatches 47; Indels 0; Gaps 0; 

Db 105 LELQGNNLTVIYETDFQRLTKLRMLQLTDNQIHTIERNSFQDLVSLERLDISNNVITTVG 164 

1:1 I :: I II I I |:| I :: : I I |||:|:| I :: 
Qy 5 LQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQAIP 64 

Db 165 RRVFKGAQSLRSLQLDNNQITCLDEHAFKGLVELEILTLNNNNLTSLPHNIFGGLGRLRA 224 

I: 1:11 :::llll UN::; ||::| : 1 1 : 1 1 1 1 1 1 1 : 1 |: I : :||: 
Qy 65 RKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKLRT 124 

Db 225 LRLSDNPFAC 234 

:H 1:1 
Qy 125 FRLHSNNLYC 134 



RESULT 2 

ID CONNJROME STANDARD; PRT; 682 AA. 

AC 001819; 

DT 01-OCT-1993 (REL. 27, CREATED) 

DT 01-OCT-1993 (REL. 27, LAST SEQUENCE UPDATE) 

DT 01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 

DE ' CONNECT IN PRECURSOR. 

GN CON. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY) . 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA, 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 92370678. 

RA NOSE A., MAHAJAN V.B., GOODMAN C.S.; 

RT "Connectin; a hemophilic cell adhesion molecule expressed on a subset 

RT of muscles and the motoneurons that innervate them in Drosophila,"; 

RL CELL 70:553-567(1992), 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-OREGON-R; 

RX MEDLINE; 93202002. 

RA GOULD A. P., WHITE R.A.H.; 

RT "Connectin, a target of homeotic gene control in Drosophila."; 

RL DEVELOPMENT 116:1163-1174(1992). 

CC ■!■ FUNCTION: CELL ADHESION PROTEIN INVOLVED IN TARGET RECOGNITION 
CC DURING NEUROMUSCULAR DEVELOPMENT . MEDIATES HOMOPHILIC CELLULAR 
CC ADHESION. 

CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR. 

CC •!■ TISSUE SPECIFICITY: PREDOMINANTLY EXPRESSED IN ABDOMINAL AND 
CC THORACIC SEGMENT MUSCLE AND MOTORNEURON CELLS. 

CC -I- DEVELOPMENTAL STAGE: EMBRYO. 
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cc 


-!• SIMILARITY: 


THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 


cc 
cc 
cc 


MANY PROTEINS. NUMBER IN THIS PROTEIN: 10. 


This SWISS-PROT 


entry is copyright. It is produced through a collaboration 


cc 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation - 


cc 


the Euro 


Dean Bioinformatics Institute. There are no restrictions on its 


cc 


use by 


non-profit institutions as long as its content is in no way 


cc 


modified and this statement is not removed. Osage by and for commercial 


cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


cc 
cc 

DR 


or send 


an email to license@isb-sib,ch), 


EMBL; M96647; G157084; -. 




DR 


EMBL; X68701; G7738; -. 




DR 


PIR; S28 


164; S28464. 




DR 


PIR; A43318; A43318. 




DR 


FLYBASE; 


FBgn0005775; Con. 




to 


PFAM; PF0Q560; LRR; 5, 




■ 


CELL ADHESION; DEVELOPMENTAL PROTEIN; EMBRYO; SIGNAL; GPI-WJCHOR; 


I 


LEUCINE-REPEAT; 


REPEAT. 




FT 


SIGNAL 


1 


24 




FT 


CHAIN 


25 


665 


CONNECTIN. 


FT 


PROPEP 


666 


682 


REMOVED IN MATURE FORM (POTENTIAL). 


FT 


DOMAIN 


142 


381 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


142 


165 


LRR 1, 


FT 


REPEAT 


166 


189 


LRR 2. 


FT 


REPEAT 


190 


213 


LRR 3, 


FT 


REPEAT 


214 


237 


LRR 4. 


FT 


REPEAT 


238 


261 


LRR 5, 


FT 


REPEAT 


262 


285 


LRR 6. 


FT 


REPEAT 


286 


299 


LRR 7. 


FT 


REPEAT 


300 


322 


LRR 8. 


FT 


REPEAT 


324 


347 


LRR 9. 


FT 


REPEAT 


348 


381 


LRR 10, 


FT 


LIPID 


665 


665 


GPI -ANCHOR (POTENTIAL). 


FT 


CONFLICT 


631 


631 


E *> G (IN REF. 2). 


FT 


CONFLICT 


674 


677 


QVAL ■> VALM (IN REF. 2). 


SQ 


SEQUENCE 


682 AA; 75992 MW 


3E15592A CRC32; 



Query Match 30.74; Score 298; DB 1; Length 682; 

Best Local Similarity 38.3%; Pred. No. 1.77e-28; 

Matches 51; Conservative 28; Mismatches 52; Indels 2; Gaps 2; 

Db 223 RLRELNLEHNQIFEMDRYAFRNL-PLCERLFLNNNNISTLHEGLFADMARLTFLNLAHNQ 281 
:H I I hi ::| lh:| I III II II: : I II III |:|: 
1 HLRVLQLMENRIST IERGAFQDLKEL - ERLRLNRNNLQLFPELLFLGTARLYRLDLSENQ 59 

282 INVLTSEIFRGLGNLNVLKLTRNNLNFIGDTVFAELWSLSELELDDNRIERISERALDGL 341 
I : III ::: I I I :: I I I |: I I |::| | |:| ::: : 
Qy 60 IQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 



Db 342 NTLKTLNLRNNLL 354 

1 1:1: |::| I 
Qy 120 PKLRTFRLHSNNL 132 



RESULT 3 

ID GPVJAT STANDARD; PRT; 567 AA. 

AC 008770; 

DT 15-JUL-1998 (REL. 36, CREATED) 

DT 15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE PLATELET GLYCOPROTEIN V PRECURSOR (GPV) (CD42D) , 

GN GP5. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI ; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=WISTAR; TISSUE-LIVER; 

RX MEDLINE; 97275136. 

RA RAVANAT C, MORALES M, , AZORSA D.O., MOOG S., SCHUHLER S., 

RA GRUNERT P., LOEW D. ( VAN DORSSELAER A,, CAZENAVE J. -P., LANZA P.; 

RT "Gene cloning of rat and mouse platelet glycoprotein V: 



RT identification of megakaryocyte-specific promoters and demonstration 

RT of functional thrombin cleavage."; 

RL BLOOD 89:3253-3262(1997) , 

CC -!- FUNCTION: THE GPIB-V-IX COMPLEX FUNCTIONS AS THE VON WILLEBRAND 
CC FACTOR RECEPTOR AND MEDIATES VON WILLEBRAND FACTOR-DEPENDENT 
CC PLATELET ADHESION TO BLOOD VESSELS, THE ADHESION OF PLATELETS TO 
CC INJURED VASCULAR SURFACES IN THE ARTERIAL CIRCULATION IS A 

CC CRITICAL INITIATING EVENT IN HEMOSTASIS (BY SIMILARITY). 

CC -!- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN, 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 15. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; Z69594 ; E222201; -. 

DR PFAM; PF00560; LRR; 8, 

KW PLATELET; TRANSMEMBRANE; GLYCOPROTEIN; BLOOD COAGULATION; 

KW REPEAT; LEUCINE-REPEAT; CELL ADHESION; SIGNAL, 



FT 


SIGNAL 


1 


16 


POTENTIAL, 


FT 


CHAIN 


17 


567 


PLATELET GLYCOPROTEIN V, 


FT 


DOMAIN 


17 


522 


EXTRACELLULAR (POTENTIAL) 


FT 


TRANSMEM 


523 


543 


POTENTIAL, 


FT 


DOMAIN 


544 


567 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


55 


415 


LEUCINE-RICH REPEATS, 


FT 


REPEAT 


55 


78 


LRR 1. 


FT 


REPEAT 


79 


102 


LRR 2. 


FT 


REPEAT 


103 


126 


LRR 3. 


FT 


REPEAT 


127 


150 


LRR 4. 


FT 


REPEAT 


151 


174 


LRR 5. 


FT 


REPEAT 


175 


198 


LRR 6. 


FT 


REPEAT 


199 


222 


LRR 7. 


FT 


REPEAT 


223 


246 


LRR 8. 


FT' 


REPEAT 


247 


270 


LRR 9. 


FT 


REPEAT 


271 


294 


LRR 10. 


FT 


REPEAT 


295 


318 


LRR 11. 


FT 


REPEAT 


319 


343 


LRR 12. 


FT 


REPEAT 


346 


367 


LRR 13. 


FT 


REPEAT 


368 


391 


LRR 14. 


FT 


REPEAT 


392 


415 


LRR 15. 


FT 


CARBOHYD 


51 


51 


POTENTIAL. 


FT 


CARBOHYD 


181 


181 


POTENTIAL. 


FT 


CARBOHYD 


243 


243 


POTENTIAL. 


FT 


CARBOHYD 


298 


298 


POTENTIAL. 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 


FT 


CARBOHYD 


385 


385 


POTENTIAL. 


FT 


CARBOHYD 


498 


498 


POTENTIAL. 


SQ 


SEQUENCE 


567 AA; 


63344 MW 


ABAEC91D CRC32; 



Query Match 27,5%; Score 267; DB 1; Length 567; 

Best Local Similarity 32.6%; Pred. No. 1.37e-23; 

Matches 43; Conservative 32; Mismatches 56; Indels 1; Gaps 1; 

Db 220 LTELRLERNHLRSIAPGAFDSLGNLSTLTLSGNLLESLPPALFLHVSWLTRLTLFENPLE 279 

I 1:1 I:: :| III: I :| I I: I |: :| III ::| II I II :: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 280 ELPEVLFGEMAGLRELWLNGTHLRTLPAAAFRNLSGLQTLGLTRNPLLSALPPGMFHGLT 3 3 9 

:| I :::| |: :: : :||| I |: I I I : : |: : |: : 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNI-TRLSVASFNHMP 120 

Db 340 ELRVLAVHTNAL 351 

II : :|:| I 
Qy 121 KLRTFRLHSNNL 132 



RESULT 4 ■ 
ID CHAO_DRQME 
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AC P12024; 

DT 01-OCT-1989 (REL. 12, CREATED) 

DT 01-OCM989 (REL. 12, LAST SEQUENCE UPDATE) 

DT 01-OCM996 (REL. 34, LAST ANNOTATION UPDATE) 

DE CHAOPTIN PRECURSOR (PHOTORECEPTOR CELL-SPECIFIC MEMBRANE PROTEIN), 

GN CHPORCHT. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY), 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 88135762. 

RA REINKE R., KRANTZ D.E., YEN D,, ZIPURSKY S.L,; 

RT "Chaoptin, a cell surface glycoprotein required for Drosophila 

RT photoreceptor cell morphogenesis, contains a repeat motif found in 

RT yeast and human,"; 

RL CELL 52:291-301(1988). 

CC -!- FUNCTION: REQUIRED FOR DROSOPHILA PHOTORECEPTOR CELL 
CC MORPHOGENESIS, MEDIATES HOMOPHILIC CELLULAR ADHESION, 

CC -!- SUBCELLULAR LOCATION: EXTRACELLULAR SURFACE OF R-CELL PLASMA 
m MEMBRANE, 

■ -!- DEVELOPMENTAL STAGE; EXPRESSED 24 HOURS AFTER INITIATION OF 
W PHOTORECEPTOR CELL DIFFERENTIATION, PERSISTS THROUGH ADULTHOOD. 

CC -!- SIMILARITY; THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 41. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib,ch) . 

cc 

DR EMBL; M19017; G157098; -. 

DR EMBL; M19008; G157098; JOINED, 

DR EMBL; M19009; G157098; JOINED. 

DR EMBL; M19010; G157098; JOINED. 

DR EMBL; M19011; G157098; JOINED, 

DR EMBL; M19012; G157098; JOINED, 

DR EMBL; M19013; G157098; JOINED, 

DR EMBL; M19014; G157098; JOINED. 

DR EMBL; M19016; G157098; JOINED. 

DR PIR; A29944; A29944. 

DR FLYBASE; FBgn0000313; chp, 

DR PFAM; PF00560; LRR; 17. 

KW GLYCOPROTEIN; MEMBRANE; SIGNAL; REPEAT; LEUCINE-REPEAT; VISION, 



FT 


SIGNAL 


1 


29 






CHAIN 


30 


1134 


CHAOPTIN, 


1 


CARBOHYD 


77 


77 


POTENTIAL. 




CARBOHYD 


267 


267 


POTENTIAL. 


FT 


CARBOHYD 


305 


305 


POTENTIAL. 


FT 


CARBOHYD 


339 


339 


POTENTIAL. 


FT 


CARBOHYD 


361 


361 


POTENTIAL. 


FT 


CARBOHYD 


422 


422 


POTENTIAL. 


FT 


CARBOHYD 


680 


680 


POTENTIAL. 


FT 


CARBOHYD 


692 


692 


POTENTIAL. 


FT 


CARBOHYD 


718 


718 


POTENTIAL, 


FT 


CARBOHYD 


746 


746 


POTENTIAL, 


FT 


CARBOHYD 


936 


936 


POTENTIAL, 


FT 


CARBOHYD 


970 


970 


POTENTIAL. 


FT 


CARBOHYD 


1012 


1012 


POTENTIAL. 


FT 


CARBOHYD 


1104 


1104 


POTENTIAL. 



SQ SEQUENCE 1134 AA; 130719 MW; B67A6363'CRC32; 

Query Match 27.2%; Score 264; DB 1; Length 1134; 

Best Local Similarity 35.6%; Pred, No. 4 .01e-23; 

Matches 48; Conservative 35; Mismatches 47; Indels 5; Gaps 5; 

Db 80 KVFMLHMENTGLREIEP-YFLQSTGMYRLKISGNHLTEIPDDAFTGLERSLWELILPQND 138 

= 1: I III : II I : : ||::: hi :|: I I I I I |::|: 
Qy 3 RVLQL • MENR • 1ST I ERGAFQDLKELERLRLNRNNLQLFPELLFLGT AR • LYRLDLSENQ 59 



Db 139 LVEIPSKSLRHLQKLRHLDLGYNHITHIQHDSFRGLEDSLQTLILRENCISQLMSHSFSG 198 

: II l::l ll:|: h :lhl I I: I I :| |::| l|: 

Qy 60 IQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRD'LEVLTLNNNNITRLSVASFNH 118 

Db 199 LLILETLDLSGNNLF 213 

; I I: I illl: 
Qy 119 MPKLRTFRLHSNNLY 133 



RESULT 5 

ID CBP8JUMAN STANDARD; PRT; 536 AA. 

AC P22792; 

DT 01-AUG-1991 (REL. 19, CREATED) 

DT 01-AUG-1991 (REL. 19, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL, 37, LAST ANNOTATION UPDATE) 

DE CARBOXYPEPTIDASE N 83 KD CHAIN (CARBOXYPEPTIDASE N REGULATORY 

DE SUBUNIT) (FRAGMENT). 

GN CPN2. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-LIVER; 

RX MEDLINE; 90094386. 

RA TAN F,, WEERASINGHE D.K., SKIDGEL R.A., TAMEI H., KAUL R.K., 

RA RONINSON I.B., SCHILLING J.W., ERDOES E.G.; 

RT "The deduced protein sequence of the human carboxypeptidase N high 

RT molecular weight subunit reveals the presence of leucine-rich tandem 

RT repeats . " ; 

RL J. BIOL. CHEM. 265:13-19(1990). 

RN [2] . 

RP PARTIAL SEQUENCE. 

RX MEDLINE; 88309120. 

RA SKIDGEL R.A., BENNETT CD., SCHILLING J.W., TAN F., WEERASINGHE D.K., 

RA ERDOES E.G.; 

RT "Amino acid sequence of the N-terminus and selected tryptic peptides 

RT of the active subunit of human plasma carboxypeptidase N: comparison 

RT with other carboxypeptidases , " ; 

RL BIOCHEM. BIOPHYS. RES. COMMON. 154:1323-1329(1988). 

CC -!- FUNCTION: THE 83 KD SUBUNIT BINDS AND STABILIZES THE CATALYTIC 

CC SUBUNIT AT 37 DEGREES CELSIUS AND KEEPS IT IN CIRCULATION. UNDER 

CC SOME CIRCUMSTANCES IT MAY BE AN ALLOSTERIC MODIFIER OF THE 

CC' CATALYTIC SUBUNIT. 

CC -!- SUBUNIT; TETRAMER OF TWO CATALYTIC CHAINS AND TWO GLYCOSYLATED 

CC INACTIVE CHAINS. 

CC -!- SUBCELLULAR LOCATION: SECRETED. 

CC ■!■ PTM: 0- GLYCOSYLATED IN THE SER/THR-RICH REGION (POTENTIAL) .' 

CC ■!• PTM: WHETHER OR NOT ANY CYS RESIDUES PARTICIPATE IN INTRACHAIN 

CC BONDS IS UNKNOWN, BUT THEY DO NOT FORM INTERCHAIN DISULFIDE BONDS 

CC WITH THE 50 KD CATALYTIC SUBUNIT. 

CC ' -!- DISEASE: A COMPLETE ABSENCE OF THE ENZYME IS NOT CONSIDERED TO BE 

CC COMPATIBLE WITH LIFE. 

CC •!• SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 12. 

CC •!• SIMILARITY: SOME, TO E.COLI YDDK. 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; J05158; G179936; -. 

DR PIR; A34901; A34901. 

DR MIM; 603104; -, 

DR PFAM; PF00560; LRR; 7. 

KW REPEAT; LEUCINE-REPEAT; GLYCOPROTEIN. 

FT NON.TER 1 1 

FT DOMAIN 68 355 LEUCINE-RICH REPEATS. 
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FT 


REPEAT 


68 


91 


LRR 1. 


FT 


REPEAT 


92 


115 


LRR 2. 


FT 


REPEAT 


116 


139 


LRR 3. 


FT 


REPEAT 


140 


163 


LRR 4. 


FT 


REPEAT 


164 


187 


LRR 5. 


FT 


REPEAT 


188 


211 


LRR 6. 


FT 


REPEAT 


212 


235 


LRR 7. 


FT 


REPEAT 


236 


259 


LRR 8. 


FT 


REPEAT 


260 


283 


LRR 9. 


FT 


REPEAT 


284 


307 


LRR 10. 


FT 


REPEAT 


308 


331 


LRR 11. 


FT 


REPEAT 


332 


355 


LRR 12. 


FT 


DOMAIN 


359 


379 


THR/SER-RICH, 


FT 


CARBOHYD 


53 


53 


POTENTIAL. 


FT 


CARBOHYD 


90 


90 


POTENTIAL. 


FT 


CARBOHYD 


98 


98 


POTENTIAL. 




CARBOHYD 


207 


207 


POTENTIAL. 


1 


CARBOHYD 


245 


245 


POTENTIAL. 




CARBOHYD 


327 


327 


POTENTIAL. 


FT 


CARBOHYD 


338 


338 


POTENTIAL, 


FT 


CARBOHYD 


495 


495 


POTENTIAL. 


SQ 


SEQUENCE 


536 AA; 


58649 MW; C4413E03 CR 



Query Match 26.8%; Score 260; DB 1; Length 536; 

Best Local Similarity 32.6%; Pred. No. 1.68e-22; 

Matches 43; Conservative 31; Mismatches 58; Indels 0; Gaps 0; 

Db 77 RLEDLEWGSSFLNLSTNIFSNLTSLGKLTLNFNMLEALPEGLFQHLAALESLHLQGNQL 136 

I:: : : : | :| | :| || | |: ;|| || | | | | ||: 
Qy 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 137 QALPRRLFQPLTHLKTLNLAQNLLAQLPEELFHPLTSLQTLKLSNNALSGLPQGVFGKLG 196 ' 

11:11: I: :| I I I :: : : |::| |: I 1 : 1 1 |: : | : 
Qy 61 QAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMP 120 

Db 197 SLQELFLDSNNI 208 

I: : I III: 
Qy 121 KLRTFRLHSNNL 132 



RESULT 6 

ID GPVJOUSE STANDARD; PRT; 567 AA. 

AC 008742; 

DT 15-JUL-1998 (REL. 36, CREATED) 

•15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
PLATELET GLYCOPROTEIN V PRECURSOR (GPV) (CD42D) . 

GN GPS, 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN-C57BL/6; TISSUE-LIVER; 

RX MEDLINE; 97275136. 

RA RAVANAT C, MORALES M. , AZORSA D.O., MOOG S., SCHUHLER S., 

RA GRUNERT P., LOEW D,, VAN DORSSELAER A,, CAZENAVE J. -P., LANZA F.; 

RT "Gene cloning of rat and mouse platelet glycoprotein V; 

RT identification of megakaryocyte-specific promoters and demonstration 

RT of functional thrombin cleavage."; 

RL BLOOD 89:3253-3262(1997). 

CC ■!- FUNCTION: THE GPIB-V-IX COMPLEX FUNCTIONS AS THE VON WILLEBRAND 
CC FACTOR RECEPTOR AND MEDIATES VON WILLEBRAND FACTOR-DEPENDENT 
CC PLATELET ADHESION TO BLOOD VESSELS. THE ADHESION OF PLATELETS TO 
CC INJURED VASCULAR SURFACES IN THE ARTERIAL CIRCULATION IS A 
CC CRITICAL INITIATING EVENT IN HEMOSTASIS (BY SIMILARITY). 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

CC •!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 15. 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 



CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch). 

cc 

DR EMBL; Z69595; E222202; -. 

DR MGD; MGI: 1096363; GPS. 

DR PFAM; PF00560; LRR; 7. 

KW PLATELET; TRANSMEMBRANE; GLYCOPROTEIN; BLOOD COAGULATION; 

KW REPEAT; LEUCINE-REPEAT; CELL ADHESION; SIGNAL. 



FT 


SIGNAL 


1 


16 


POTENTIAL. 


FT 


CHAIN 


17 


567 


PLATELET GLYCOPROTEIN V. 


FT 


DOMAIN 


17 


522 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


523 


543 


POTENTIAL. 


FT 


DOMAIN 


544 


567 


CYTOPLASMIC (POTENTIAL) . 


FT 


DOMAIN 


55 


415 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


55 


78 


LRR 1. 


FT 


REPEAT 


79 


102 


LRR 2. 


FT 


REPEAT 


103 


126 


LRR 3. 


FT 


REPEAT 


127 


150 


LRR 4. . 


FT 


REPEAT 


151 


174 


LRR 5, 


FT 


REPEAT 


175 


198 


LRR 6, 


FT 


REPEAT 


199 


222 


LRR 7. 


FT 


REPEAT 


223 


246 


LRR 8. 


FT 


REPEAT 


247 


270 


LRR 9. 


FT 


REPEAT 


271 


294 


LRR 10. 


FT 


REPEAT 


295 


318 


LRR 11. 


FT 


REPEAT 


319 


343 


LRR 12. 


FT 


REPEAT 


346 


367 


LRR 13. 


FT 


REPEAT 


368 


391 


LRR 14, 


FT 


REPEAT 


392 


415 


LRR 15. 


FT 


CARBOHYD 


51 


51 


POTENTIAL. 


FT 


CARBOHYD 


67 


67 


POTENTIAL. 


FT 


CARBOHYD 


181 


181 


POTENTIAL. 


FT 


CARBOHYD 


243 


243 


POTENTIAL. 


FT 


CARBOHYD 


298 


298 


POTENTIAL. 


FT 


CARBOHYD 


312 


312 


POTENTIAL. 


FT 


CARBOHYD 


385 


385 


POTENTIAL. 


SQ 


SEQUENCE 


567 AA; 


63467 MW 


3AE7515E CRC32; 



Query Match 26.7%; Score 259; DB 1; Length 567; 

Best Local Similarity 32.6%; Pred. No. 2.40e-22; 

Matches 43; Conservative 31; Mismatches 57; Indels 1; Gaps 1; 

Db 220 LTELRLERNHLRSVAPGAFDRLGNLSSLTLSGNLLESLPPALFLHVSSVSRLTLFENPLE 279 

I hi I:: :: III: I :| I h I |: :| III : : II I II :: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 280 ELPDVLFGEMAGLRELWLNGTHLSTLPAAAFRNLSGLQTLGLTRNPRLSALPRGVFQGLR 339 

:| I :::| |: ::| : :||| I |: I I I :: |: : | : 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNN-NITRLSVASFNHMP 120 

Db 340 ELRVLGLHTNAL 351 

II : I f : I I 
Qy 121 KLRTFRLHSNNL 132 



RESULT 7 

ID GPVJUMAN STANDARD; PRT; 560 AA, 

AC P40197; 

DT 01-FEB-1995 (REL. 31, CREATED) 

DT 01-FEB-1995 (REL. 31, LAST SEQUENCE UPDATE) 

DT 01-FEB-1995 (REL. 31, LAST ANNOTATION UPDATE) 

DE PLATELET GLYCOPROTEIN V PRECURSOR (GPV) (CD42D). 

GN GP5. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-LUNG; 

RX MEDLINE; 93391348. 

RA HICKEY M.J., HAGEN F.S., YAGI M. ( ROTH G.J,; 
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RT "Human platelet glycoprotein V: characterization of the polypeptide 

RT and the related Ib-v-ix receptor system of adhesive, leucine-rich 

RT glycoproteins,"; 

RL PROC. NATL. ACAD. SCI. U.S.A. 90:8327-8331(1993). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE-PLATELET; 

RX MEDLINE; 94012616. 

RA LANZA P., MORALES M. , DE LA SALLE C., CAZENAVE J. -P., CLEMETSON K.J., 

RA SHIMOMURA T., PHILLIPS D.R.; 

RT "Cloning and characterization of the gene encoding the human platelet 

RT glycoprotein V. A member of the leucine-rich glycoprotein family 

RT cleaved during thrombin -induced platelet activation."; 

RL J. BIOL. CHEM, 268:20801-20807(1993). 

RN [3] 

RP PARTIAL SEQUENCE. 

RC TISSUE-PLATELET; 

RX MEDLINE; 90275263. 

RA SHIMOMURA T., FUJIMURA K., MAEHAMA S., TAKEMOTO M, , ODA K,, 

RA FUJIMOTO T., OYAMA R,, SUZUKI M. ( ICIHARA-TANAKA K. ( TITANI K., 

RA KURAMOTO A. ; 

•"Rapid purification and characterization of human platelet 
glycoprotein V: the amino acid sequence contains leucine-rich 
repetitive modules as in glycoprotein lb."; 

RL BLOOD 75:2349-2356(1990). 

RN [4] 

RP PARTIAL SEQUENCE, 

RC TISSUE-PLATELET; 

RX MEDLINE; 90321220. 

RA ROTH G.J., CHURCH T.A., MCMULLEN B.A., WILLIAMS S.A.; 

RT "Human platelet glycoprotein V: a surface leucine-rich glycoprotein 

RT related to adhesion."; 

RL BIOCHEM. BIOPHYS. RES. COMMUN. 170:153-161(1990). 

CC -!• FUNCTION: THE GPIB-V-IX COMPLEX FUNCTIONS AS THE VON WILLEBRAND 

CC FACTOR RECEPTOR AND MEDIATES VON WILLEBRAND FACTOR -DEPENDENT 

CC PLATELET ADHESION TO BLOOD VESSELS. THE ADHESION OF PLATELETS TO 

CC INJURED VASCULAR SURFACES IN THE ARTERIAL CIRCULATION IS A 

CC CRITICAL INITIATING EVENT IN HEMOSTASIS. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- TISSUE SPECIFICITY: PLATELETS AND MEGAKARYOCYTES. 

CC -!- PTM: THE N-TERMINAL IS BLOCKED, 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 15. 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

•entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to licenseGisb-sib.ch). 

DR EMBL; L11238; G388760; -. 

DR EMBL; Z23091; G312502; -. 

DR MIM; 173511; -. 

DR PFAM; PF00560; LRR; 8. 

DR HSSP; P16473; 1XUM. 

KW PLATELET; TRANSMEMBRANE; GLYCOPROTEIN; BLOOD COAGULATION; 

KW REPEAT; LEUCINE-REPEAT; CELL ADHESION; SIGNAL. 



FT 


SIGNAL 


1 16 


POTENTIAL. 


FT 


CHAIN 


17 560 


PLATELET GLYCOPROTEIN V. 


FT 


DOMAIN 


17 523 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 5 


24 544 


POTENTIAL. 


FT 


DOMAIN 5 


5 560 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


35 415 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


:5 78 


LRR 1. 


FT 


REPEAT 


9 102 


LRR 2. 


FT 


REPEAT 1 


33 126 


LRR 3, 


FT 


REPEAT 1 


27 150 


LRR 4. 


FT 


REPEAT 1 


jl 174 


LRR 5, 


FT 


REPEAT 1 


5 198 


LRR 6, 


FT 


REPEAT 1 


9 222 


LRR 7, 


FT 


REPEAT 2 


3 246 


LRR 8. 



FT 


REPEAT 


247 


270 


LRR 9, 


FT 


REPEAT 


271 


294 




FT 


REPEAT 


295 


318 


TRR 1l" 




REPEAT 


319 


343 


11* 


FT 


REPEAT 


346 


367 


trr n 


FT 


REPEAT 


368 


391 


1 ' 


FT 


REPEAT 


392 


415 


TRR K ' 






51 






FT 


CARBOHYD 


181 


181 








243 


243 


FUltNllAL. 


FT 


CARBOHYD 


267 


267 


WYTITMTTM 
rvlLN ilnlj. 


FT 


CARBOHYD 


298 


298 




FT 


PARROHYn 


312 


312 




FT 


CARBOHYD 


385 


385 




FT 


CARBOHYD 


499 


499 




FT 


CONFLICT 


73 


74 


MT -> TK (IN REF. 2). 


FT 


CONFLICT 


109 


109 


K -> T (IN REF. 2), 


FT 


CONFLICT 


130 


130 


D -> W (IN REF, 3), 


FT 


CONFLICT 


136 


138 


GID -> PGG (IN REF. 3). 


FT 


CONFLICT 


209 


209 


L -> I (IN REF, 2). 


FT 


CONFLICT 


267 


267 


N -> H (IN REF. 3). 


FT 


CONFLICT 


327 


327 


L -> I (IN REF. 2). 


FT 


CONFLICT 


478 


478 


P -> G (IN REF. 2). 


FT 


CONFLICT 


509 


509 


P -> D (IN REF. 2). 


SQ 


SEQUENCE 


560 AA; 


60959 MW 


FD65EDD2 CRC32; 



Query Match 26.6*; Score 258; DB 1; Length 560; 

Best Local Similarity 32.8%; Pred. No. 3.43e-22; 

Matches 43; Conservative 37; Mismatches 51; Indels 0; Gaps 0; 

Db 76 LQRLMISDSHISAVAPGTFSDLIKLKTLRLSRNKITHLPGALLDKMVLLEQLFLDHNALR 135 

I: ! : :::lh: hi II I llhll:: :| |: I :| I I :: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 136 GIDQNMFQKLVNLQELALNQNQLDFLPASLFTNLENLKLLDLSGNNLTHLPKGLLGAQAK 195 

:| " h I" :| I: lh : : I I :| :| |: ||:|:|: : : :| 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 196 LERLLLHSNRL 206 

I : MM I 
Qy 122 LRTFRLHSNNL 132 



RESULT 8 

ID ALS HUMAN STANDARD; PRT; 605 AA. 

AC P35858; 

DT 01-JUN-1994 (REL. 29, CREATED) 

DT 01-JUN-1994 (REL. 29, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID LABILE CHAIN 

DE PRECURSOR (ALS). 

GN IGFALS OR ALS. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A., AND PARTIAL SEQUENCE. 

RC TISSUE-LIVER; 

RX MEDLINE; 92357025. 

RA LEONG S.R., BAXTER R.C., CAMERATO T., DAI J., WOOD W.I.; 

RT "Structure and functional expression of the acid-labile subunit of 

RT the insulin-like growth factor-binding protein complex. 11 ; 

RL MOL. ENDOCRINOL. 6:870-876(1992), 

RN [2] 

RP SEQUENCE OF 28-35. 

RX MEDLINE; 89308584. 

RA BAXTER R.C., MARTIN J.L., BENIAC V.A.; 

RT "High molecular weight insulin-like growth factor binding protein 

RT complex. Purification and properties of the acid-labile subunit from 

RT human serum."; 

RL J, BIOL, CHEM. 264:11843-11848(1989). 

CC -I- FUNCTION: INVOLVED IN PROTEIN-PROTEIN INTERACTIONS THAT RESULT 
CC IN PROTEIN COMPLEXES, RECEPTOR-LIGAND BINDING OR CELL ADHESION. 
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SUBUNIT : FORMS A TERNARY COMPLEX OF ABOUT 140 TO 150 KD WITH IGF- I 

OR IGF-II AND IGFBP-3. 

SUBCELLULAR LOCATION: EXTRACELLULAR. 

TISSUE SPECIFICITY: PLASMA. 

SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS, NUMBER IN THIS PROTEIN: 20. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation ■ 
the European Bioinformatics Institute, There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://ww.isb-sib.ch/announce/ 
or send an email to licenseSisb-sib.ch). 



DR EMBL; M86826; G184808; -. 

•PIR; A41915; A41915. 

• HIM; 601489; -. 
PFAM; PF00560; LRR; 10. 

DR HSSP; P23945; 1XUN. 

KW GLYCOPROTEIN; LEUCINE-REPEAT; 

FT SIGNAL 

FT CHAIN 
FT 

FT DOMAIN 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

FT REPEAT 

•REPEAT 
CARBOHYD 
CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

SQ SEQUENCE 605 AA; 66034 MW; B5027E19 CRC32; 

Query Match 26,04; Score 252; DB 1; Length 605; 

Best Local Similarity 32,81; Pred. No. 2.91e-21; 

Matches 43; Conservative 30; Mismatches 58; Indels 0; Gaps 0 

Db 220 LRELDLSRNALRAIKANVFVQLPRLQKLYLDRNLIAAVAPGAFLGLKALRWLDLSHNRVA 279 

II hi I : :| I :| |::| hll : : III I ;|||l |:: 
Qy 2 LRVLQLMENRI ST I ERGAFQDLKELERLRLNRNKLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 280 GLLEDTFPGLLGLRVLRLSHNAIASLRPRTFKDLHFLEELQLGHNRIRQLAERSFEGLGQ 339 

:: :l I : :: hi I h : :|: h II I I :| I :|: II: : 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 340 LEVLTLDHNQL 350 

1:111 
Qy 122 LRTFRLHSNNL 132 



1 


27 




28 


605 


INSULIN-LIKE GROWTH FACTOR BINDING 






PROTEIN, ACID LABILE CHAIN. 


79 


536 


LEUCINE-RICH REPEATS. 


79 


89 


LRR 1, 


90 


113 


LRR 2. 


114 


137 


LRR 3.' 


138 


161 


LRR 4. 


162 


185 


LRR 5. 


186 


209 


LRR 6. 


210 


233 


LRR 7, 


234 


257 


LRR 8, 


258 


281 


LRR 9. 


282 


305 


LRR 10. 


306 


329 


LRR 11. 


330 


353 


LRR 12. 


354 


377 


LRR 13. 


378 


401 


LRR 14. 


402 


425 


LRR 15. 


426 


449 


LRR 16. 


450 


473 


LRR 17, 


474 


497 


LRR 18. 


498 


521 


LRR 19. 


522 


536 


LRR 20. 


64 


64 


POTENTIAL, 


85 


85 


POTENTIAL. 


96 


96 


POTENTIAL, 


368 


368 


POTENTIAL. 


515 


515 


POTENTIAL. 


580 


580 


POTENTIAL, 



RESULT 9 
ID ALSJAPF 
AC 002833; 



PRT; 605 AA. 



DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID LABILE CHAIN 

DE PRECURSOR (ALS). 

GN IGFALS OR ALS. 

OS PAPIO PAPIO (GUINEA BABOON) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; CERCOPITHECIDAE; CERCOPITHECINAE; PAPIO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-LIVER; 

RX MEDLINE; 97040714. 

RA DELHANTY P., BAXTER R.C.; 

RT "The cloning and expression of the baboon acid-labile subunit of the 

RT insulin-like growth factor binding protein complex."; 

RL BIOCHEM. BIOPHYS, RES. COMMUN. 227:897-902(1996). 

CC -!- FUNCTION: INVOLVED IN PROTEIN- PROTEIN INTERACTIONS THAT RESULT . 
CC IN PROTEIN COMPLEXES, RECEPTOR- LIGAND BINDING OR CELL ADHESION. 

CC -!- SUBUNIT: FORMS A TERNARY COMPLEX OF ABOUT 140 TO 150 KD WITH IGF- I 
CC OR IGF-II AND IGFBP-3 (BY SIMILARITY) . 

CC -!- SUBCELLULAR LOCATION: EXTRACELLULAR, 

CC -I- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 20, 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 



CC use by non-profit institutions as 1 
CC 



as its content is in no way 
modified and "this statement is not removed. Usage by and for commercial 
CC entities requires a license agreement (See http://ww.isb-sib.ch/announce/ 
CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; S83462; E323796; -. 
DR PFAM; PF00560; LRR; 10. 
DR HSSP; P23945; 1XUN. 

KW GLYCOPROTEIN; LEUCINE-REPEAT; REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


27 


BY SIMILARITY, 


FT 


CHAIN 


28 


605 


INSULIN-LIKE GROWTH FACTOR BINDING 


FT 








PROTEIN, ACID LABILE CHAIN. 


FT 


DOMAIN 


79 


536 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


79 


89 


LRR 1. 


FT 


REPEAT 


90 


113 


LRR 2, 


FT 


REPEAT 


114 


137 


LRR 3. 


FT 


REPEAT 


138 


161 


LRR 4. 


FT 


REPEAT 


162 


185 


LRR 5. 


FT 


REPEAT 


186 


209 


LRR 6. 


FT 


REPEAT 


210 


233 


LRR 7, 


FT 


REPEAT 


234 


257 


LRR 8. 


FT 


REPEAT 


258 


281 


LRR 9. 


FT 


REPEAT 


282 


305 


LRR 10, 


FT 


REPEAT 


306 


329 


LRR 11. 


FT 


REPEAT 


330 


353 


LRR 12. 


FT 


REPEAT 


354 


377 


LRR 13. 


FT 


REPEAT 


378 


401 


LRR 14. 


FT 


REPEAT 


402 


425 


LRR 15. 


FT 


REPEAT 


426 


449 


LRR 16. 


FT 


REPEAT 


450 


473 


LRR 17, 


FT 


REPEAT 


474 


497 


LRR 18. 


FT 


REPEAT 


498 


521 


LRR 19. 


FT 


REPEAT 


522 


536 


LRR 20. 


FT 


CARBOHYD 


64 


64 


POTENTIAL, 


FT 


CARBOHYD 


85 


85 


POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL, 


FT 


CARBOHYD 


368 


368 


POTENTIAL. 


FT 


CARBOHYD 


515 


515 ■ 


POTENTIAL. 


FT 


CARBOHYD 


580 


580 


POTENTIAL, 


SQ 


SEQUENCE 


605 AA; 


66110 MW 


5DF04D42 CRC32; 



Query Match 25.5%; Score 248; DB 1; Length 605; 

Best Local Similarity 32,84; Pred. No. 1.20e-20; 

Matches 43; Conservative 29; Mismatches 59; Indels 0; Gaps 0; 
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Db 220 LRELDLSRNALRAIKANVFAQLPRLQKLYLDRNLIAAVAPGAFLGLKALRWLDLSHNRVA 279 

II hi I : :l I :| l::l 1:11 : : III I :|||| |:: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 280 GLLEDTFPGLLGLRVLRLSHNAIASLRPRTFEDLHFLEELQLGHNRIRQLAERSFEGLGQ 339 

:| I : " hi I h : :| h II I I :| I :|: ||: : 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 340 LEVLTLDHNQL 350 

1:111 
Qy 122 LRTFRLHSNNL 132 



RESULT 10 

ID ALSJAT STANDARD; PRT; 603 AA. 

AC P35859; 

DT 01-JON-1994 (REL. 29, CREATED) 

DT 01-JUN-1994 (REL. 29, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE INSULIN- LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID. LABILE CHAIN 

DE PRECURSOR (ALS), 

•IGFALS OR ALS. 
RATTUS NORVEGICUS (RAT). 
EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=LIVER; 

RX MEDLINE; 93038676. 

RA DAI J., BAXTER B.C.; 

RT "Molecular cloning of the acid-labile subunit of the rat insulin-like 

RT growth factor binding protein complex."; 

RL BIOCHEM. BIOPHYS. RES, COMMUN. 188:304-309(1992). 

RN [2] 

RP SEQUENCE OF 24-44, AND CHARACTERIZATION. 

RC STRAIN=WISTAR; TISSUE-SERUM; 

RX MEDLINE; 94130835. 

RA BAXTER R.C., DAI J.; 

RT "Purification and characterization of the acid-labile subunit of rat 

RT serum insulin-like growth factor' binding protein complex."; 

RL ENDOCRINOLOGY 134:848-852(1994). 

CC -!- FUNCTION: MAY HAVE AN IMPORTANT ROLE IN REGULATING THE ACCESS OF 
CC CIRCULATING IGFS TO THE TISSUES. 

CC -!- SUBUNIT: FORMS A TERNARY COMPLEX OF ABOUT 140 TO 150 KD WITH IGF- I 

CC OR IGF-II AND IGFBP-3. 

CC -!- SUBCELLULAR LOCATION: EXTRACELLULAR. 

CC -!- TISSUE SPECIFICITY: BRAIN, KIDNEY, LUNG, HEART, SPLEEN, MUSCLE 
CC AND LIVER. 

CC -!• SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS. NUMBER IN THIS PROTEIN: 20. 



w This SWISS -PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; S46785; E64972; -. 

DR PIR; JC1282; JC1282. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P23945; 1XUN. 

KW GLYCOPROTEIN; LEUCINE-REPEAT; REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


23 




FT 


CHAIN 


24 


603 


INSULIN-LIKE GROWTH FACTOR BINDING 


FT 








PROTEIN, ACID LABILE CHAIN. 


FT, 


DOMAIN 


79 


535 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


79 


89 


LRR 1. 


FT 


REPEAT 


90 


113 


LRR 2, 


FT 


REPEAT 


114 


137 


LRR 3. 


FT 


REPEAT 


138 


161 


LRR 4. 


FT 


REPEAT 


162 


185 


LRR 5. 



FT 


REPEAT 


186 


209 






REPEAT 


210 


233 


TRR 7* 


FT 


REPEAT 


234 


257 




FT 


RFPTW 


258 


281 




FT 


REPEAT 


282 


305 


TRR 10 




REPEAT 


306 


329 


LRR 11 ' 


FT 


REPEAT 


330 


353 


LRR 12, 


FT 


REPEAT 


354 


377 






REPEAT 


378 


401 


TRR u' 


FT 


REPEAT 


402 


425 




FT 


REPEAT 


426 


449 


TRR Ifi 




REPEAT 


450 


473 


TRR 17 ' 


FT 


REPEAT 


474 


497 


LRR 18 


FT 


REPEAT 


498 


521 


LRR 19, 


FT 


REPEAT 


522 


535 


LRR 20. 


FT 


CARBOHYD 


, 64 


64 


POTENTIAL. 


FT 


CARBOHYD 


' 85 


85 


POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL. 


FT 


CARBOHYD 


368 


368 


POTENTIAL. 


FT 


CARBOHYD 


515 


515 


POTENTIAL. 


FT 


CARBOHYD 


578 


578 


POTENTIAL. 


FT 


CARBOHYD 


586 


586 


POTENTIAL, 


SQ 


SEQUENCE 


603 AA; 


66811 MW 


5BB22D53 CRC32; 



Query Match 24.9%; Score 242; DB 1; Length 603; 

Best Local Similarity 34.1*; Pred. No. 1.00e-19; 

Matches 46; Conservative 34; Mismatches 52; Indels 3; Gaps 3; 

Db 240 HLPRLQKLYLDRNLITAVAPGAFLGMKALRWLDLSHNRVAGLMEDTFPGLLGLHVLRLAH 299 

II II I I III :l I :| |::| : : I I I I I |: 
Qy 1 HLRVLQ-L-ME-NRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSE 57 

Db 300 NAIASLRPRTFKDLHFLEELQLGHNRIRQLGERTFEGLGQLEVLTLNDNQITEVRVGAFS 359 

I I =: ::h : :||| hi : : :| :| :|||||||:| II : |::|: 
Qy 58 NQIQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFN 117 

Db 360 GLFNVAVMNLSGNCL 374 

: 1=11 
Qy 118 HMPKLRTFRLHSNNL 132 



RESULT 11 

ID ALSJOUSE STANDARD; PRT; 603 AA. 

AC P70389; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL, 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID LABILE CHAIN 

DE PRECURSOR (ALS). 

GN IGFALS OR ALS OR ALBS. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN. [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-129/SV; 

RX MEDLINE; 96413591. 

RA BOISCLAIR Y.R., SETO D., HSIEH S., HURST K.R., OOI G.T.; 

RT "Organization and chromosomal localization of the gene encoding the 

RT mouse acid labile subunit of the insulin-like growth factor binding 

RT complex."; 

RL PROC. NATL. ACAD. SCI. U.S.A. 93:10028-10033(1996). 

CC -!- FUNCTION: MAY HAVE AN IMPORTANT ROLE IN REGULATING THE ACCESS OF 

CC CIRCULATING IGFS TO THE TISSUES. 

CC -!- SUBUNIT: FORMS A TERNARY COMPLEX OF ABOUT 140 TO 150 KD WITH IGF- I 
CC OR IGF-II AND IGFBP-3 (BY SIMILARITY). 

CC •!* SUBCELLULAR LOCATION: EXTRACELLULAR. 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS, NUMBER IN THIS PROTEIN: 20. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 
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cc 


use by 


non-profit institutions as long as its content is in no way 


cc 


modified 


and this statement is not removed. Usage by and for commercial 


cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


cc 
cc 

DR 


or send t 


n email to license@isb-sib.ch) . 


EMBL; U66900; G1621613; -. 




DR 


MGD; MGI: 107973; IGFALS. 




DR 


PFAM; PF00560; LRR; 10. 




KW 


GLYCOPROTEIN; LEUCINE -REPEAT; REPEAT; SIGNAL. 


FT 


SIGNAL 


1 


23 


BY SIMILARITY. 


FT 


CHAIN 


24 


603 


INSULIN-LIKE GROWTH FACTOR BINDING 


FT 








PROTEIN, ACID LABILE CHAIN. 


FT 


DOMAIN 


79 


535 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


79 


89 


LRR 1. 


FT 


REPEAT 


90 


113 


LRR 2. 


FT 


REPEAT 


114 


137 


LRR 3. 




REPEAT 


138 


161 


LRR 4. 


w 


REPEAT 
REPEAT 


162 
186 


185 
209 


LRR 5. 
LRR 6. 


FT 


REPEAT 


210 


233 


LRR 7. 


FT 


REPEAT 


234 


257 


LRR 8. 


FT 


REPEAT 


258 


281 


LRR 9, 


FT 


REPEAT 


282 


305 


LRR 10, 


FT 


REPEAT 


306 


329 


LRR 11. 


FT 


REPEAT 


330 


353 


LRR 12. 


FT 


REPEAT 


354 


377 


LRR 13, 


FT 


REPEAT 


378 


401 


LRR 14 , 


FT 


REPEAT 


402 


425 


LRR 15. 


FT 


REPEAT 


426 


449 


LRR 16. 


FT 


REPEAT 


450 


473 


LRR 17, 


FT 


REPEAT 


474 


497 


LRR 18. 


FT 


REPEAT 


498 


521 


LRR 19. 


FT 


REPEAT 


522 


535 


LRR 20. 


FT 


CARBOHYD 


64 


64 


POTENTIAL. 


FT 


CARBOHYD 


85 


85 


POTENTIAL. 


FT 


CARBOHYD 


96 


96 


POTENTIAL. 


FT 


CARBOHYD 


368 


368 


POTENTIAL, 


FT 


CARBOHYD 


515 


515 


POTENTIAL. 


FT 


CARBOHYD 


578 


578 


POTENTIAL, 


FT 


CARBOHYD 


586 


586 


POTENTIAL. 


SQ 


SEQUENCE 


603 AA; 


66959 MW 


11ADB606 CRC32; 



Query Match 24.8%; Score 241; DB 1; Length 603; 

Best Local Similarity 30.5%; Pred, No. 1.42e-19; 

•Matches 40; Conservative 32; Mismatches 59; Indels 0; Gaps C 
f 220 LRELDLSRNALRSVKANVFIHLPRLQKLYLDRNLITAVAPRAFLGMKALRWLDLSHNRVA 279 
II hi I : :: II |::| |:|| : : III | :|||| |:: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 280 GLLEDTFPGLLGLHVLRLAHNAITSLRPRTFKDLHFLEELQLGHNRIRQLGEKTFEGLGQ 339 

=1 I : : hi I h : :|: h II I I :| I :|: :|; ; 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 340 LEVLTLNDNQI 350 

I : I: I : 
Qy 122 LRTFRLHSNNL 132 



RESULT 12 
ID 
AC 



STANDARD; 



PRT; 1115 AA. 



GPCR.LYMST 
P46023; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL, 32, LAST SEQUENCE UPDATE) 

DT 01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 

DE G-PROTEIN COUPLED RECEPTOR GRL101 PRECURSOR. 

OS LYMNAEA STAGNALIS (GREAT POND SNAIL) , 

OC EUKARYOTA; METAZOA; MOLLUSCA; GASTROPODA; PULMONATA; BASOMMATOPHORA; 

OC LYMNAEIDAE; LYMNAEA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=CNS; 

RX MEDLINE; 94255418. 



RA TENSEN CP., VAN KESTEREN E.R., PLANTA R.J., COX K.J. A. , BURKE J.F., 

RA VAN HEERIKHUIZEN H. , VREUGDENHIL E.; 

RT "A G protein-coupled receptor with low density lipoprotein-binding 

RT motifs suggests a role for lipoproteins in G-linked signal 

RT transduction,"; 

RL PROC, NATL. ACAD. SCI. U.S.A. 91:4816-4820(1994). 

CC -I- FUNCTION: MIGHT DIRECTLY TRANSDUCE SIGNALS CARRIED BY LARGE 

CC EXTRACELLULAR (LIPO) PROTEIN (COMPLEXE)S INTO NEURONAL EVENTS. 

CC -I- SUBCELLULAR LOCATION: INTEGRAL MEMBRANE PROTEIN. 

CC •!• TISSUE SPECIFICITY; PREDOMINANTLY EXPRESSED IN A SMALL NUMBER OF 

CC. NEURONS WITHIN THE CENTRAL NERVOUS SYSTEM AND TO A LESSER EXTENT 

CC IN THE HEART, 

CC -!■ SIMILARITY: BELONGS TO FAMILY 1 OF G-PROTEIN COUPLED RECEPTORS. 

CC -!- SIMILARITY: CONTAINS 12 LDL- RECEPTOR CLASS A DOMAINS. 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 6. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; Z23104; G438129; -. 

DR PIR; S40241; S40241. 

DR GCRDB; GCRJ860; -. 

DR PROSITE; PS00237; G PROTEIN RECEPTOR; FALSE NEG. 

DR PROSITE; PS01209; LDLRA 1; 6. 

DR PROSITE; PS50068; LDLRAJ; 11. 

DR PFAM; PF00001; 7tm 1; 1. 

DR PFAM; PF00057; ldl recept a; 11. 

DR PFAM; PF00560; LRR; 3. 

DR HSSP; P01130; 1AJJ. 

KW G-PROTEIN COUPLED RECEPTOR; TRANSMEMBRANE; GLYCOPROTEIN; REPEAT; 

KW SIGNAL. 



FT 


SIGNAL 


1 


24 


POTENTIAL. 


FT 


CHAIN 


25 


1115 


G-PROTEIN COUPLED RECEPTOR GRL101. 


FT 


DOMAIN 


25 


767 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


768 


788 


1 (POTENTIAL). 


FT 


DOMAIN 


789 


801 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


802 


822 


2 (POTENTIAL). 


FT 


DOMAIN 


823 


857 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


858 


878 


3 (POTENTIAL). 


FT 


DOMAIN 


879 


887 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


888 


908 


4 (POTENTIAL). 


FT 


DOMAIN 


909 


941 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


942 


962 


5 (POTENTIAL). 


FT 


DOMAIN 


963 


988 


CYTOPLASMIC (POTENTIAL),. 


FT 


TRANSMEM 


989 


1009 


6 (POTENTIAL), 


FT 


DOMAIN 


1010 


1017 


EXTRACELLULAR (POTENTIAL), 


FT 


TRANSMEM 


1018 


1038 


7 (POTENTIAL). 


FT 


DOMAIN 


1039 


1115 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


32 


523 


12 X 40 AA APPROXIMATE TANDEM REPEATS 


FT 








SIMILAR TO THE LDL-RECEPTOR CLASS A. 


FT 


DOMAIN 


36 


79 


LDL -RECEPTOR CLASS A 1, 


FT 


DOMAIN 


.77 


115 


LDL-RECEPTOR CLASS A 2. 


FT 


DOMAIN 


116 


155 


LDL-RECEPTOR CLASS A 3. 


FT 


DOMAIN 


156 


196 


LDL-RECEPTOR CLASS A 4. 


FT 


DOMAIN 


195 


232 


LDL-RECEPTOR CLASS A 5. 


FT 


DOMAIN 


231 


269 


LDL-RECEPTOR CLASS A 6. 


FT 


DOMAIN 


272 


318 


LDL-RECEPTOR CLASS A 7. 


FT 


DOMAIN 


320 


363 


LDL-RECEPTOR CLASS A 8. 


FT 


DOMAIN 


365 


403 


LDL-RECEPTOR CLASS A 9. 


FT 


DOMAIN 


404 


442 


LDL-RECEPTOR CLASS A 10, 


FT 


DOMAIN 


444 


485 


LDL-RECEPTOR CLASS A 11. 


FT 


DOMAIN 


486 


525 


LDL-RECEPTOR CLASS A 12. 


FT 


DOMAIN 


588 


731 


LEUCINE-RICH REPEATS. 


FT 


REPEAT 


588 


611 


LRR 1. 


FT 


REPEAT 


612 


635 


LRR 2. 


FT 


REPEAT 


636 


659 


LRR 3. 


FT 


REPEAT 


660 


683 


LRR 4. 
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FT REPEAT 

FT REPEAT 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

•DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

SQ SEQUENCE 



684 
708 

38 

46 

60 

79 

86 

98 
118 
138 
158 
165 
177 
202 
214 
233 
240 
252 
274 
282 
298 
322 
334 
346 
367 
374 
386 
406 
413 
425 
446 
453 
465 
488 
495 
507 

87 
166 
269 
318 
482 
502 
571 
618 
624 
685 

1115 AA 



707 
731 

53 

66 

77 

91 
104 
113 
131 
153 
170 
183 
194 
220 
230 
245 
258 
267 
291 
304 
316 
339 
352 
361 
379 
392 
401 
418 
431 
440 
458 
474 
483 
500 
513 
523 

87 
166 
269 
318 
482 
502 
571 
618 
624 
685 
125865 



LRR 5. 
LRR 6. 

BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
POTENTIAL, 
POTENTIAL. 
POTENTIAL, 
POTENTIAL, 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 

MW; 2AEC245A CRC32; 



k Query Match 24.4*; Score 237; DB 1; Length 1115; 

lest Local Similarity 28.6*; Pred. No. 5.79e-19; 
latches 38; Conservative 40; Mismatches 55; Indels C 



Db 609 LTHLNLADNNITSLKNGSLLGLSNLKQLHINGNKIETIEEDTFSSMIHLTVLDLSNQRLT 668 

I I I :| I::: |:: I :| :|::| |::: : I I : ;| Mil: :: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 669 HVYKNMFKGLKQITVLNISRNQINSIDNGAFNNLANVRLIDLSGNVIKDIGQRVFMGLPR 728 

: :: hi :| I : III: h:||| I :: :: h I I :: I :|: 
Oy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 729 LVELKTDSYRFCC 741 

I :: I : I 
Qy 122 LRTFRLHSNNLYC 134 



RESULT 13 

ID A2GLJUMAN STANDARD; PRT; 312 AA. 
P02750; 

DT 21-JUL-1986 (REL. 01, CREATED) 
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE) 
DT 01-OCT-1994 (REL. 30, LAST ANNOTATION UPDATE) 
DE LEUCINE-RICH ALPHA-2-GLYCOPROTEIN (LRG), 
OS HOMO SAPIENS (HUMAN). 



AC 



EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

PRIMATES; CATARRHINI; HOMINIDAE; HOMO, 

[1] 

SEQUENCE. 

MEDLINE; 85166241. 

TAKAHASHI N., TAKAHASHI Y., PUTNAM F.W.; 
"Periodicity of leucine and tandem repetition of a 24 -amino acid 
segment in the primary structure of leucine-rich alpha 2 -glycoprotein 
of human serum."; 

PROC. NATL. ACAD. SCI. U.S.A. 82:1906-1910(1985). 

-I- FUNCTION; THE FUNCTION OF THIS PLASMA PROTEIN IS NOT KNOWN. 

•I* SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

MANY PROTEINS. 
PIR; A03211; NBHUA2 , 
SWISS-2DPAGE; P02750; HUMAN. 
PFAM; PF00560; LRR; 4. 

PLASMA; GLYCOPROTEIN; REPEAT; LEUCINE-REPEAT. 



DISULFID 
DISULFID 



CARBOHYD 
CARBOHYD 



CARBOHYD 
CARBOHYD 



268 
2 

44 
151 
234 
290 
271 

312 AA, 



21 
294 
2 

44 
151 
234 
290 

271 POTENTIAL. 
34346 MW; 48C3DB08 CRC32; 



Query Match 24,21; Score 235; DB 1; Length 312; 

Best Local Similarity 31.8%; Pred. No. 1.17e-18; 

Matches 42; Conservative 39; Mismatches 49; Indels 2; Gaps 2 

Db 107 LDTLVLKENQLEVLEVSWLHGLKALGHLDLSGNRLRRLPPGL-LANFTLLRTLDLGENQL 165 

I I hlh: :| : :: II I :| h I h :| I h II llhllh 
3y 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYR-LDLSENQI 60 

Db 166 ETLPPDLLRGPLQLERLHLEGNKLQVLGKDLLLPQPDLRYLFLNGNKLARVAAGAFQGLR 225 

::':| :lh::: hh I : : : : II I II |:::|:: ::| : 
2y 61 QAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMP 120 

Db 226 QLDMLDLSNNSL 237 

I : I :hl 
2y 121 RLRTFRLHSNNL 132 



RESULT 14 

ID CHADJOVIN STANDARD; PRT; 361 AA. 

AC Q27972; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL, 35, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE CHONDROADHERIN PRECURSOR (CARTILAGE LEUCINE-RICH PROTEIN) (38 KD BONE 

DE PROTEIN) . 

OS BOS TAURUS (BOVINE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC ARTIODACTYLA; RUMINANTIA; PECORA; BOVOIDEA; BOVIDAE; BOVINAE; BOS, 

RN [1] 

RP SEQUENCE FROM N, A., AND PARTIAL SEQUENCE, 

RC TISSUE-CARTILAGE; 

RX MEDLINE; 94342341. 

RA NEAME P.J., SOMMARIN Y , , BOYNTON R.E,, HEINEGARD D . ; 

RT "The structure of a 38-kDa leucine-rich protein (chondroadherin) 

RT isolated from bovine cartilage,"; 

RL J. BIOL. CHEM. 269:21547-21554(1994). 

RN [2] 

RP SEQUENCE OF 25-55 AND 77-97, 

RC TISSUE-BONE; 

RX MEDLINE; 95113864, 

RA HU B., COULSON L., MOYER B. ( PRICE P.A.; 

RT "Isolation and molecular cloning of a novel bone phosphoprotein 

RT related in sequence to the cystatin family of thiol protease 

RT inhibitors."; 

RL J. BIOL. CHEM. 270:431-436(1995). 
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cc 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


cc 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation • 


cc 


the European Bioinformatics Institute. There are no restrictions on its 


cc 


use by 


non-profit institutions as long as its content is in no way 


cc 


modified and this statement is not removed, Osage by and for commercial 


cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


cc 
cc 

DR 


or send an email to license@isb-sib.ch). 


EMBL; DO 


018; G470672; -. 




DR 


PFAM; PF00560; LRR; 5. 




KW 


REPEAT; SIGNAL. 






FT 


SIGNAL 


1 


24 


OR 23 (IN SOME ISOFORM(S)). 


FT 


CHAIN 


25 


361 


CHONDROADHERIN. 


FT 


CHAIN 


25 


352 


CHONDROADHERIN, MINOR FORM. 


FT 


DOMAIN 


79 


317 


10 X 24 AA LEUCINE-RICH TANDEM REPEATS, 


FT 


REPEAT 


79 


102 


1. 




REPEAT 


103 


126 




■ 


REPEAT 


127 


150 


3. 


¥ 


REPEAT 


151 


174 


4, 


FT 


REPEAT 


175 


198 


5. 


FT 


REPEAT 


199 


222 


6. 


FT 


REPEAT 


223 


246 


7, 


FT 


REPEAT 


248 


271 


8. 


FT 


REPEAT 


272 


293 


9. 


FT 


REPEAT 


294 


317 


10. 


FT 


DISULFID 


306 


348 




FT 


DISULFID 


308 


328 




FT 


CONFLICT 


25 


25 


C ■> Y (IN REF. 2). 


FT 


CONFLICT 


29 


29 


C -> W (IN REF. 2). 


FT 


CONFLICT 


31 


31 


C -> H (IN REF. 2). 


FT 


CONFLICT 


40 


40 


C •> L (IN REF. 2). 


FT 


CONFLICT 


52 


52 


S -> R (IN REF. 2). 


so 


SEQUENCE 


361 AA; 


40884 MW 


A370BB91 CRC32; 



Query Match 22.9%; Score 222; DB 1; Length 361; 

Best Local Similarity 33.6*; Pred. No. 1.07e-16; 

Matches 45; Conservative 30; Mismatches 56; Indels 3; Gaps 3; 

Db 150 NLFILQLNNNKIRELRSGAFQGAKDLRWLYLSENSLSSLQPGAL-DDVENLAKFYLDRNQ 208 

:| :lll :hl : llll hi :| |: hi I I I I :: I II 
2y 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQ-LFPELLFLGTARLYRLDLSENQ 59 

Db 209 LSSYPSAALSKLRWEELKLSHNPLKSIPDNAFQSFGRYLETLWLDNTNLEKFSDGAFLG 268 
: : I h : :| I I : I I II::: I II I hi h ::| ::| 

»60 IQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRAL-RDLEVLTLNNNNITRLSVASFNH 118 
269 VTTLKHVHLENNRL 282 
: h :| :| I 
3y 119 MPKLRTFRLHSNNL 132 



RESULT 
ID 
AC 
DT 
DT 
DT 



15 



STANDARD; 



PRT; 662 AA. 



GARPJDMAN 
Q14392; 

01-NOV-1997 (REL. 35, CREATED) 
01-NQV-1997 (REL. 35, LAST SEQUENCE UPDATE) 
01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 
DE GARP PROTEIN PRECURSOR (GARPIN) . 
GN GARP. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 
RN [1] 

RP SEQUENCE FROM N.A. 
RX MEDLINE; 94235567. 

RA OLLENDORFF V., NOGUCHI T., DELAPEYRIERE O., BIRNBAUM D.; 

RT "The GARP gene encodes a new member of the family of leucine-rich 

RT repeat-containing proteins."; 

RL CELL GROWTH DIFFER. 5:213-219(1994), 

CC -!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

CC -!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 

CC MANY PROTEINS. NUMBER IN THIS PROTEIN: 20, THERE ARE TWO BLOCKS OF 

CC 10 LRR'S, 



CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

DR EMBL; Z24680; G439296; -. 
DR MIM; 137207; •. 
DR PFAM; PF00560; LRR; 10. 
KW GLYCOPROTEIN; LEUCI NE * REP EAT ; 



This SWISS-PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 



REPEAT; TRANSMEMBRANE; SIGNAL. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


CHAIN 


20 


662 


GARP PROTEIN. 


FT 


DOMAIN 


20 


627 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


628 


648 


POTENTIAL. 


FT 


DOMAIN 


649 


662 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


54 


292 


LEUCINE-RICH REPEATS (1ST 


FT 


DOMAIN 


■ 320 


565 


LEUCINE-RICH REPEATS (2ND 


FT 


REPEAT 


54 


77 


LRR 1-1. 


FT 


REPEAT 


78 


101 


LRR 1-2. 


FT 


REPEAT 


102 


128 


LRR 1-3. 


FT 


REPEAT 


129 


153 


LRR 1-4. 


FT 


REPEAT 


154 


177 


LRR 1-5. 


FT 


REPEAT 


178 


201 


LRR 1-6. 


FT 


REPEAT 


202 


222 


LRR 1-7. 


FT 


REPEAT 


223 


247 


LRR 1-8. 


FT 


REPEAT 


248 


269 


LRR 1-9, 


FT 


REPEAT 


270 


292 


LRR 1-10. 


FT 


REPEAT 


320 


343 


LRR 2-1. 


FT 


REPEAT 


344 


367 


LRR 2-2. 


FT 


REPEAT 


368 


390 


LRR 2-3. 


FT 


REPEAT 


391 


414 


LRR 2-4. 


FT 


REPEAT 


415 


438 


LRR 2-5. 


FT 


REPEAT 


448 


470 


LRR 2-6, 


FT 


REPEAT 


471 


495 


LRR 2-7, 


FT 


REPEAT 


496 


518 


LRR 2-8. 


FT 


REPEAT 


519 


540 


LRR 2-9. 


FT 


REPEAT 


541 


565 


LRR 2-10, 


FT 


CARBOHYD 


203 


203 


POTENTIAL. 


FT 


CARBOHYD 


271 


271 


POTENTIAL. 


FT 


CARBOHYD 


308 


308 


POTENTIAL. 


FT 


CARBOHYD 


345 


345 


POTENTIAL. 


FT 


CARBOHYD 


545 


545 


POTENTIAL. 



SQ SEQUENCE 662 AA; 71978 MW; D7B74960 CRC32; 

Query Match 22.9%; Score 222; DB 1; Length 662; 

Best Local Similarity 32,8%; Pred. No, 1.07e-16; 

Matches 45; Conservative 30; Mismatches 58; Indels 4; Gaps 4" 

Db 75 LRHLDLSTNEISFLQPGAFQALTHLEHLSLAHNRLAMATALSAGGLGPLPRVTSLDLSGN 134 

II hi I II llll I 1 1 : 1 I :| I : I II :|: llll I 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPEL • LF ■ LGT - ARLYRLDLSEN 58 

Db 135 SLYSGLLERLLGEAPSLHTLSLAENSLTRLTRHTFRDMPALEQLDLHSNVLMDIEDGAFE 194 

: :: : : I : I I I :: : :|| : II I |::| : : ::|: 
Qy 59 QI-QAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFN 117 

Db 195 GLPRLTHLNLSRNSLTC 211 

:hl : I hi I 
Qy 118 HMPKLRTFRLHSNNLYC 134 



Search completed: Fri May 28 09:30:54 1999 
Job time : 18 sees. 
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**************************************************************************** 

Release 3.1A John F. Collins, Biocomputing Research unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

IM?srch_pp protein ■ protein database search, using Smith-Waterman algorithm 
iu on: Fri May 28 09:31:17 1999; MasPar time 14.13 Seconds 

517.680 Million cell updates/sec 

Tabular output not generated. 

Title: MJS-09-191-647-12 

Description: (1-134) from US09191647. pep 

Perfect Score: 971 

Sequence: 1 HLRVLQLMENRISTIERGAF SFNHMPKLRTFRLHSNNLYC 134 

Scoring table: PAM 150 
Gap 11 

Searched: 179066 seqs, 54579' 



Post-processing: Minimum Match 0* 

Listing first 45 summaries 

Database: sptrembl9 

l:sp_archea 2:sp_bacteria 3:sp_fungi 4:sp_human 
5 :sp_in vertebrate 6:sp_mamnial 7 : spjnhc 8:sp_organelle 
9:sp_phage 10:sp_plant ll:sp_rodent 12:sp_unclassified 
13:sp_vertebrate 14;sp_virus 

Statistics: Mean 44.205; Variance 111.499; scale 0.396 

Pred. No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
Jfe and is derived by analysis of the total score distribution. 



Query 



Score 


Match Length 


DB 


ID 


Description 


Pred. 


to, 


780 


80.3 


1523 


11 


088280 


MEGF5. 


1.47e 


102 


777 


80.0 


1531 


11 


088279 


MEGF4. 


4.52e 


102 


347 


35.7 


1496 


4 


Q92626 


MYELOBLAST KIAA0230 (F 


8.67e 


34 


274 


28.2 


1066 


5 


Q18902 


CODED FOR BY C. ELEGAN 


6.72e 


23 


270 


27.8 


811 


4 


075139 


KIAA0644 PROTEIN. 


2.58e 


22 


267 


27.5 


610 


5 


Q21604 


M88.6 PROTEIN, 


7.07e 


22 


267 


27.5 


1091 


11 


P70193 


MEMBRANE GLYCOPROTEIN. 


7.07e 


22 


261 


26.9 


907 


4 


075473 


ORPHAN G PROTEIN-COUPL 


5.26e 


21 


260 


26.8 


331 


13 


093233 


PHOSPHOLIPASE A2 INHIB 


7.34e 


21 


254 


26.2 


1385 


5 


Q26388 


TLR-TOLL-LIKE RECEPTOR 


5.41e 


20 


250 


25,7 


1389 


5 


Q24591 


WHEELER. 


2.04e 


19 


246 


25.3 


738 


5 


Q93373 


C44H4.2 PROTEIN, 


7.65e 


19 


245 


25.2 


733 


5 


Q24250 


TARTAN PROTEIN PRECURS 


1.06e 


18 


241 


24.8 


603 


11 


070211 


INSULIN-LIKE GROWTH FA 


3.97e 


18 


239 


24.6 


683 


5 


Q22187 


T05A1.3 PROTEIN. 


7.66e 


18 


238 


24.5 


880 


5 


P91643 


KEKl PRECURSOR. 


l,06e 


17 


235 


24.2 


358 


11 


070210 


CHONDROADHERIN. 


2.84e 


17 


234 


24.1 


358 


11 


055226 


CHONDROADHERIN. 


3.94e 


17 


232 


23.9 


562 


5 


021164 


SIMILAR TO LRR. 


7.57e 


17 


230 


23.7 


428 


4 


014498 


ISLR PRECURSOR. 


U5e 


16 



21 


230 


23,7 


516 4 


043300 


KIAA0416. 


1.45e 


16 


22 


230 


23.7 


660 4 


043155 


KIAA0405. 


1.45e 


16 


23 


226 


23,3 


1100 5 


Q24622 


TOLL PROTEIN. 


5.33e 


16 


24 


224 


23.1 


789 5 


016781 


SIMILARITY TO MULTIPLE 


1.02e 


15 


25 


224 


23,1 


1355 5 


016779 


CODED FOR BY C. ELEGAN 


l,02e 


15 


26 


222 


22.9 


713 4 


075325 


GLIOMA AMPLIFIED ON CH 


l,94e 


15 


27 


218 


22,5 


892 5 


P91644 


KEK2 PRECURSOR (FRAGME 


7.06e 


15 


28 


217 


22.3 


653 5 


002329 


T23G11.6 PROTEIN. 


9,73e 


15 


29 


215 


22.1 


359 4 


015335 


CHONDROADHERIN. 


1.85e 


14 


30 


213 


21.9 


716 11 


Q61809 


LEUCINE -RICH - REPEAT PR 


3.51e 


14 


31 


213 


21.9 


718 13 


073675 


NEURONAL LEUCINE -RICH 


3.51e 


14 


32 


211 


21,7 


135 6 


046377 


BIGLYCAN (FRAGMENT). 


6.65e 


14 


33 


211 


21.7 


.224 5 


044086 


ZK994.4 PROTEIN. 


6.65e 


14 


34 


210 


21.6 


369 6 


046390 


BIGLYCAN PRECURSOR. 


9.15e 


14 


35 


210 


21,6 


. 372 6 


046403 


BIGLYCAN. 


9.15e 


14 


36 


209 


21.5 


707 11 


P97860 


LEUCINE-RICH REPEAT PR 


l,26e 


13 


37 


209 


21.5 


961 5 


P90920 


K07A12.2 PROTEIN. 


l,26e 


13 


38 


206 


21.2 


526 10 


022753 


PREDICTED LEUCINE-RICH 


3.27e 


13 


39 


206 


21.2 


680 5 


Q93374 


C44H4.3 PROTEIN, 


3,27e 


13 


40 


203 


20.9 


458 5 


Q93377 


C44H4.1 PROTEIN. 


8.46e 


13 


41 


203 


20.9 


1535 5 


Q23991 


PEROXIDASE PRECURSOR. 


8.46e 


13 


42 


201 


20.7 


360 6 


046542 


DERMATAN SULFATE PROTE 


1.59e 


12 


43 


201 


20.7 


360 6 


028886 


DECORIN. 


1.59e 


12 


44 


201 


20.7 


522 4 


043354 


BAC CLONE GS099H08, CO 


1.59e 


12 


45 


200 


20.6 


515 5 


015912 


RANDOM SLUG CDNA21 PRO 


2.18e 


12 



ALIGNMENTS 



RESULT 
ID 
AC 



1 



088280 PRELIMINARY; PRT; 1523 AA. 
088280; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF5 , 

GN MEGF5. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-S PRAGUE -DAWLEY ; TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M. , NAKAJIMA D., NAGASE T., NOMURA N, , SEKI N., OHARA 0,; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif-trap screening,"; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011531; D1033424; -. 

DR PROSITE; PS01185; CTCK_1; 1. 

DR PROSITE; PS01186; EGF_2; 7. 

DR PROSITE; PS01187; EGF.CA; 2, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

SQ SEQUENCE 1523 AA; 167767 MW; 2BD845D0 CRC32; 

Query Match 80,3%; Score 780; DB 11; Length 1523; 

Best Local Similarity 76,1%; Pred, No, 1.47e-102; 

Conservative 23; Mismatches 9; Indels 0; Gaps 0; 



NLRVLHLEDNQVSVIERGAFQDLKQLERLRLNKNKLQVLPELLFQSTPKLTRLDLSENQI 145 
:||lhl 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 : 1 : 1 1 :: 1 1 1 1 1 :|::| 1 1 1 1 1 1 1 1 1 



Mlllllll HIIIII |:||||||||||||||||:||||||!l:|: hlllll! 



Matches 


Db 


86 


Qy 


1 


Db 


146 


Qy 


61 


Db 


206 


Qy 


121 



1:11:11111:111 
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ID 088279 PRELIMINARY; PRT; 1531 AA. 

AC 088279; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE ) 

DE MEGF4 , 

GN MEGF4 . 

OS RATTUS NORVEGICUS (RAT) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EDTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=SPRAGUE-DAWLEY; TISSUE=BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M. , NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA O.; 

RT "Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif-trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011530; D1033423; -. 

•PROSITE; PS01185; CTCK 1; 1. 
PROSITE; PS01186; EGF 2; 8. 
PROSITE; PS01187; EGF_CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

Query Match 80.0%; Score 777; DB 11; Length 1531; 

Best Local Similarity 77.61; Pred. No. 4.52e-102; 

Matches 104; Conservative 18; Mismatches 12; Indels 0; Gaps 0; 

Db 86 QLRVLQLMENQIGAVERGAFDDMKELERLRLNRNQLQVLPELLFQNNQALSRLDLSENSL 145 

:lllll!ll|:|:::!llll:i:|||||IHII! I|::lllll I lllllll : 
Qy 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 146 QAVPRKAFRGATDLKNLQLDKNQISCIEEGAFRALRGLEVLTLNNNNITTIPVSSFNHMP 205 

!!:IMIMII 1:111111 MMilhlllllli Mllllllllll ::|:||llll 
Qy 61 QAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMP 120 

Db 206 KLRTFRLHSNHLFC 219 

1111111111:1:1 
Qy 121 KLRTFRLHSNNLYC 134 



RESULT 3 

ID Q92626 PRELIMINARY; PRT; 1496 AA. 
AC Q92626; 

DT 01-FEB-1997 (TREMBLREL. 02, CREATED) 

DT 01-FEB-1997 (TREMBLREL, 02, LAST SEQUENCE UPDATE) 

•01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
MYELOBLAST KIAA0230 (FRAGMENT). 
KIAA0230. 
OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 
OC CATARRHM; HOMINIDAE; HOMO. 
RN [1] 

RP SEQUENCE FROM N.A. 
RC TISSUE-BONE MARROW; 
RX MEDLINE; 97191544, 

RA NAGASE T., SEKI N,, ISHIKAWA K,, OHIRA M., KAWARABAYASI Y., OHARA O., 
RA TANAKA A., KOTANI H., MIYAJIMA N., NOMURA N.; 

RT "Prediction of the coding sequences of unidentified human genes. VI. 

RT The coding sequences of 80 new genes (KIAA0201-KIAA0280) deduced by 

RT analysis of cDNA clones from cell line KG-1 and brain."; 

RL DNA RES. 3:321-329(1996) . 

DR EMBL; D86983; D1013908; -. 

DR PFAM; PF00047; ig; 4. 

DR PFAM; PF00093; vwc; 1. 

DR PFAM; PF00141; peroxidase; 1. 

DR PFAM; PF00560; LRR; 3. 

FT NONJTER 1 1 

SQ SEQUENCE 1496 AA; 167209 MW; 5731EE51 CRC32; 

Query Match 35.7%; Score 347; DB 4; Length 1496; 

Best Local Similarity 40.9%; Pred. No, 8,67e-34; 



Matches 54; Conservative 26; Mismatches 50; Indels 2; Gaps 2; 



Db 


83 


ILDLRFNRIREIQPGAFRRLRNLNTLLLNNNQIKRIPSGAFEDLENLKY-LYLYKNEIQS 141 
:|: III I: |||: |::|: 1 II 1 : :| | III |:||: 
VLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARL-YRLDLSENQIQA 62 


Qy 


4 


Db 


142 


IDRQAFKGLASLEQLYLHFNQIETLDPDSFQHLPKLERLFLHNNRITHLVPGTFNHLESM 201 
1 1 11:1 : 1 1 :IH :|: 1 II 1 1:11 ||:| ::IM: : 
IPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKL 122 


Qy 


63 


Db 


202 


KRLRLDSNTLHC 213 
: :l || | | 
RTFRLHSNNLYC 134 


Qy 


123 



RESULT 4 

ID Q18902 PRELIMINARY; PRT; 1066 AA, 

AC Q18902; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE CODED FOR BY C. ELEGANS CDNA YK132E5.5. 

GN C56E6.6. 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENT EA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN' [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K. , BAYNES C, BERKS M,, 

RA BONFIELD J., BURTON J., CONNELL M., COPSEY T., COOPER J., COULSON A,, 

RA CRAXTON M. , DEAR S,, DU Z., DURBIN R,, FAVELLO A., FULTON L., 

RA GARDNER A,, GREEN P., HAWKINS T., HILLIER L., JIER M, , JOHNSTON L, 

RA JONES M,, KERSHAW J., KIRSTEN J., LAISTER N. , LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R., 

RA SMALDON N,, SMITH A., SONNHAMMER E. , STADEN R., SULSTON J., 

RA THIERRY -MIEG J., THOMAS K., VAUDIN M. , VAUGHAN K. , WATERSTON R, , 

RA WATSON A,, WEINSTOCK L, WILKINS0N-SPR0AT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans,"; 

RL NATURE 368:32-38(1994). 

RN [2] 

RP SEQUENCE FROM N.A, 

RA FULTON L. ; 

RL SUBMITTED (NOV-1995) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [3] 

RP SEQUENCE FROM N.A. 

RA WATERSTON R. ; 

RL SUBMITTED (NOV-1995) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; 039996; G1055120; -. 

DR PFAM; PF00560; LRR; 18. 

SQ SEQUENCE 1066 AA; 122109 MW; 20D7DDEF CRC32; 

Query Match 28,2%; Score 274; DB 5; Length 1066; 

Best Local Similarity 31,6%; Pred. No. 6.72e-23; 

Matches 42; Conservative 37; Mismatches 52; Indels 2; Gaps 2; 

Db 392 LRHLMLDNNQIQKIDNFSLADLPKLQHLSLAGNQLDIITENMFGSSSSSELKSLNLAHNK 451 

II I I :|:| I: :: II |::| I I |::: I :| ::: I |:|: I 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLF-LGTAR-LYRLDLSENQ 59 

Db 452 IHSISSRSFSDLDNLQQLRLSHNNIRTITSMTFSNLRNLRYLDLSHNRIIKILPSALYQL 511 

|::|: "I :: hi II I :| ||:| I |::| I :: ::: :: 
Qy 60 IQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 

Db 512 PALDVLHLDHNNL 524 

I I ::l III 

Qy 120 PKLRTFRLHSNNL 132 



RESULT 5 

ID 075139 PRELIMINARY; PRT; 811 AA. 
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AC 


075139; 




DT 


01-NOV-1998 (TREMBLREL. 08, CREATED) 




DT 


01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 




DT 


01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 




DE 


KIAA0644 PROTEIN. 




GN 


KIAA0644. 




OS 


HOMO SAPIENS (HUMAN) . 




OC 


EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHER 


IA; PRIMATES; 


OC 


CATARRHINI; HOMINIDAE; HOMO. 




RN 


11] 




RP 


SEQUENCE FROM N.A. 




RC 


TISSUE-BRAIN; 




RX 


MEDLINE; 98403880. 




RA 


ISHIKAWA K., NAGASE T., SUYAMA M, , MIYAJIMA N., TANAKA A., 


KOTANI H., 


RA 


NOMURA N. , OHARA 0.; 




RT 


"Prediction of the coding sequences of unidentified human 


genes. X. 


RT 


The complete sequences of 100 new cDNA clones from brain which can 


RT 


code for large proteins in vitro."; 






DNA RES. 5:169-176(1998). 




DR 


EMBL; AB014544; D1032580; •. 




SQ 


SEQUENCE 811 AA; 88695 MW; C8B8C147 CRC32; 





Query Match 27.8%; Score 270; DB 4; Length 811; 

Best Local Similarity 32.3%; Pred. No, 2.58e-22; 

Matches 43; Conservative 34; Mismatches 54; Indels 2; Gaps 2; 

Db 133 LRILYANGNEISRLSRGSFEGLESLVKLRLDGNALGALPDAVFAPLGNLLYLHLESNRIR 192 ' 

11:1 I II : M:l: I I :l|l: | | :|: :| : | | | |;|: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 193 FLGKNAFAQLGKLRFLNLSANELQPSLRHAATFAPLRSLSSLILSANSLQHLGPRIFQHL 252 

: ::|| :: I I |:: : : ::| :|| I I |: |:: :|: | |: 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQI-SCI-EDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 

Db 253 PRLGLLSLRGNQL 265 

hi : |::| I 
Qy 120 PKLRTFRLHSNNL 132 



RESULT 6 

ID Q21604 PRELIMINARY; PRT; 610 AA, 

AC Q21604; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-HOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-JAN-1999 (TREMBLREL. 09, LAST ANNOTATION UPDATE) 

DE M88.6 PROTEIN. 

tfN M88.6. 

B CAENORHABDITIS ELEGANS. 

W EUKARYOTA; METAZOA; NEMATODA; SECERNENT EA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA SULSTONJ.; 

RL SUBMITTED (JUN-1994) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718. 

RA WILSON R,, AINSCOUGH R,, ANDERSON K. ( BAYNES C, BERKS M., 

RA BONFIELD J., BURTON J,, CONNELL M., COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M., DEAR S., DU Z,, DURBIN R., FAVELLO A., FULTON L., 

RA GARDNER A., GREEN P., HAWKINS T., HILLIER L., JIER M., JOHNSTON L,, 

RA JONES M, , KERSHAW J., KIRSTEN J., LAISTER N., LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M. , 

RA PARSONS J., PERCY C, RIFKEN L., RO0PRA A,, SAUNDERS D., SHOWNKEEN R,, 

RA SMALDON N., SMITH A., SONNHAMMER E. , STADEN R. , SULSTON J., 

RA THIERRY-MIEG J., THOMAS K, , VAUDIN M., VAUGHAN K., WATERSTON R., 

RA WATSON A,, WEINSTOCK L. , WILKINSON*SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans . " ; 

RL NATURE 368:32-38(1994). 

DR EMBL; Z34802; E1348350; -. 

SQ SEQUENCE 610 AA; 68394 MW; 515CAA24 CRC32; 



Query Match 27.5%; Score 267; DB 5; Length 610; 

Best Local Similarity 35.5%; Pred. No. 7.07e-22; 

Matches 43; Conservative 28; Mismatches 50; Indels 0; Gaps 0; 

Db 153 LKTLDLAMNKIQEIDVGAFEELKKVEELLLNENDIRVLKTGTFDGMKNLKKLTLQNCNLE 212 

' h 1:1 1:1 |: lll::|| :| I II |::::: II I :| I : :: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 213 IIQKGAFRGLNSLEQLILSNNNLENIDWTIFSALKNLRVLDLGSNKISNVEMKSFPKLEK 272 

I : Nil : I I I : I: I I:: || I :|:|: : : || : | 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 273 L 273 

I 

Qy 122 L 122 



RESULT 7 

ID P70193 PRELIMINARY; PRT; 1091 AA. 

AC P70193; 

DT 01-FEB-1997 (TREMBLREL. 02, CREATED) 

DT OHEB-1997 (TREMBLREL. 02, LAST SEQUENCE UPDATE) 

DT 01-JAN-1999 (TREMBLREL. 09, LAST ANNOTATION UPDATE) 

DE MEMBRANE GLYCOPROTEIN. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 96394313. 

RA SUZUKI Y., SATO N., TOHYAMA M,, WANAKA A., TAKAGI T.; 

RT "cDNA cloning of a novel membrane glycoprotein that is expressed 

RT specifically in glial cells in the mouse brain LIG-I: a protein with 

RT leucine-rich repeats and immunoglobulin-like domains."; 

RL J, BIOL. CHEM. 271:22522-22527(1996). 

DR EMBL; D78572; D1012081; -. 

DR MGD; MGI:107935; IMG. 

DR PFAM; PF00047; ig; 3. 

DR PFAM; PF00560; LRR; 7. 

KW MEMBRANE 

SQ SEQUENCE' 1091 AA; 119283 MW; C0F262F9 CRC32; 

Query Match 27.5%; Score 267; DB 11; Length 1091; 

Best Local Similarity 35.3%; Pred, No, 7.07e-22; 

Matches 47; Conservative 37; Mismatches 47; Indels 2; Gaps 2; 

Db 166 RIRELNLASNRISILESGAFDGLSRSLLTLRLSKNRITQLPVKAF-KLPRLTQLDLNRNR 224 

::| I I Mil ! Ill: I : I III:: : :| I :|| :|||: |: 
Qy 1 HLRVLQLMENRISTIERGAFQDL-RELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQ 59 

Db 225 IRLIEGLTFQGLDSLEVLRLQRNNISRLTDGAFWGLSKMHVLHLEYNSLVEVNSGSLYGL 284 

I: I :|:l : hi: I II : lll|::| : II |: |:: :: :|: : 
Qy 60 IQAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHM 119 

Db 285 TALHQLHLSNNSI 297 

I: "I :|:: 
Qy 120 PKLRTFRLHSNNL 132 



RESULT 8 

ID 075473 PRELIMINARY; PRT; 907 AA. 

AC 075473; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE ORPHAN G PROTEIN-COUPLED RECEPTOR HG38, 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 98308104. 

RA MCDONALD T,, WANG R., BAILEY W., XIE G., CHEN F., CASKEY C.T., 
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RA LIU Q. ; 

RT "Identification and cloning of an orphan G protein-coupled receptor 

RT of the glycoprotein hormone receptor subfamily. 

RL BIOCHEM. BIOPHYS. RES. COMMON. 247:266-270(1998), 

DR EMBL; AFQ62006; G3366802; -. 

SQ SEQUENCE 907 AA; 99997 MW; B9147406 CRC32; 

Query Match 26.9%; Score 261; DB 4; Length 907; 

Best Local Similarity 36.6*; Pred. No, 5,26e-21; 

Matches 48; Conservative 24; Mismatches 59; Indels 0; Gaps 0; 

Db 116 LKVLMLQNNQLRHVPIEALQNLRSLQSLRLDANHISYVPPSCFSGLHSLRHLWLDDNALT 175 

hll I :h: : |:|:|: |: III: |:: I I I :| I :| : 
Oy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 176 EIPVQAFRSLSALQAMTIALNKIHHIPDYAFGNLSSLWLHLHNNRIHSLGKKCFDGLHS 235 

II III: : : I II I I II I I II hll I I: |: : 
Oy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

1236 LETLDLNYKNL 246 
I h h III 
122 LRTFRLHSNNL 132 



RESULT ' 9 

ID 093233 PRELIMINARY; PRT; 331 AA. 

AC 093233; 

DT 01-NOV-1998 (TREMBLREL, 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE PHOSPHOLIPASE A2 INHIBITOR. 

OS GLOYDIUS BLOMHOFFII (MAMUSHI), 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; LEPIDOSAURIA; SQUAMATA; 

OC SCLEROGLOSSA; SERPENTES; COLUBROIDEA; VIPERIDAE; CROTALINAE; 

OC AGKISTRODON. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=LIVER; 

RX MEDLINE; 98344034. 

RA OKUMURA K. , OHKURA N., INOUE S., IKEDA K., HAYASHI K.; 

RT "A novel phospholipase A2 inhibitor with leucine-rich repeats from 

RT the blood plasma of Agkistrodon blomhoffii siniticus (subtitle: 

RT Sequence homologies with human leucine-rich alpha2 -glycoprotein)."; 

RL J. BIOL. CHEM. 273:19469-19475(1998). 

DR EMBL; AB007198; D1032956; -. 

KW PHOSPHOLIPASE A2 INHIBITOR. 

SQ SEQUENCE 331 AA; 37091 MW; D764D70F CRC32; 

•i Query Match 26,8%; Score 260; DB 13; Length 331; 

Best Local Similarity 32,8%; Pred. No, 7.34e-21; 
Matches 44; Conservative 33; Mismatches 57; Indels 0; Gaps 0; 

Db 80 NLQELHLSNNRLKTLPSGLFRNLPQLHTLDLSTNHLEDLPPEIFTNASSLILLPLSENQL 139 

:|: 1:1 :||: h I |::| :| I |: |:|: :| :| :: I I MM: 
Qy 1 HLRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQI 60 

Db 140 AELHPSWFQTLGELRILGLDHNQVKEIPISCFDKLKKLTSLDLSFNLLRRLAPEMFSGLD 199 

: I: ::: Mill: I : I |: I I |: I : l|: |: : 
Qy 61 QAIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMP 120 

Db 200 NLEKLILESNPIQC 213 

: : I || : | 
Qy 121 KLRTFRLHSNNLYC 134 



RESULT 10 

ID Q26388 PRELIMINARY; PRT; 1385 AA, 

AC Q26388; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE TLR-TOLL-LIKE RECEPTOR, 

GN TLR. 



OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 95151581. 

RA CHIANG C, BEACHY P. A.; 

RT "Expression of a novel Toll-like gene spans the parasegment boundary 

RT and contributes to hedgehog function in the adult eye of 

RT Drosophila."; 

RL MECH. DEV. 47:225-239(1994), 

DR EMBL; S76155; G913248; -. 

DR FLYBASE; FBgn0004364; 18w. 

DR PFAM; PF00560; LRR; 13. 

SQ SEQUENCE 1385 AA; 154848 MW; 60273533 CRC32; 

Query Match 26.2%; Score 254; DB 5; Length 1385; 

Best Local Similarity 32,1%; Pred. No. 5.41e-20; 

Matches 42; conservative 33; Mismatches 55; Indels 1; Gaps 1: 

Db 359 LQILDMRNNSIGHIEEGAFLPLYNLHTLNLAENRLHTLDNRIFNGLYVLTKLTLNNNLVS 418 

MM:: :| 1 : 1 1 1 1 1 I : I I I I I : : : : I I I :| |::| : 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 419 IVESQAFRNCSDLRELDLSSNQLTEVPEAA-QDLSMLKTLDLGENQISEFKNNTFRNLNQ 477 

: III 1:1:1:1 II:: : ::| : I I I I : I I : : :| :: 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 478 LTGLRLIDNRI 488 

I Ml I : 
Qy 122 LRTFRLHSNNL 132 



RESULT 11 

ID Q24591 PRELIMINARY; PRT; 1389 AA, 

AC Q24591; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE WHEELER. 

GN WHEELER. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 95324375. 

RA ELDON E. ( KOOYER S,, D' EVELYN D,, DUMAN M., LAWINGER P., BOTAS J., 

RA BELLENH,; 

RT "The Drosophila 18 wheeler is required for morphogenesis and has 

RT striking similarities to Toll."; 

RL DEVELOPMENT 120:885-899(1994). 

DR EMBL; L23171; G1019104; •. 

DR FLYBASE; FBgn0004364; 18w. 

DR PFAM; PF00560; LRR; 13. 

SQ SEQUENCE 1389 AA; 155260 MW; 8F0E6F5A CRC32; 

Query Match 25.7%; Score 250; DB 5; Length 1389; 

Best Local Similarity 31.3%; Pred. No, 2.04e-19; 
Matches 41; Conservative 33; Mismatches 56; Indels 1; Gaps 1- 

Db 359 LQILDMRNNSIGHIEEGAFLPLYNLHTLNLAENRLHTLDNRIFNGLYVLTKLTLNNNLVS 418 

MM:: : 1 : 1 1 1 1 1 I : I I I I I : : : : I I I M MM : 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 419 IVESQAFRNCSDLKELDLSSNQLTEVPEAV-QDLSMLKTLDLGENQISEFKNNTFRNLNQ 477 

: III 1:1:1:1 II:: : : : : I I I I : I I : : :| :: 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 478 LTGLRLIDNRI 488 

I M | : 
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Qy 122 LRIFRLHSNNL 132 



RESULT 12 

ID Q93373 PRELIMINARY; PRT; 738 AA, 

AC Q93373; 

DT 01-FEB-1997 (TREMBLREL. 02, CREATED) 

DT Ol-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-JAN-1999 (TREMBLREL. 09, LAST ANNOTATION UPDATE) 

DE C44H4 .2 PROTEIN, 

GN C44H4 ,2. 

OS CAENORHABDITIS ELEGANS, 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS . 

RN [1] 

RP SEQUENCE FROM N.A. 

RA SMYE R,; 

RL SUBMITTED (AUG-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 

EN [2] 

P SEQUENCE FROM N.A. 

X MEDLINE; 94150718, 

RA WILSON R., AINSCOUGH R. , ANDERSON K., BAYNES C, BERKS M., 

RA BONFIELD J,, BURTON J., CONNELL M., COPSEY T., COOPER J,, COULSON A., 

RA CRAXTON M, , DEAR S., DU Z., DURBIN R., FAVELLO A., FULTON L, 

RA GARDNER A., GREEN P., HAWKINS T,, HILLIER L., JIER M., JOHNSTON L. ( 

RA JONES M., KERSHAW J., KIRSTEN J., LAISTER N, , LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M. , 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R., 

RA SMALDON N. , SMITH A., SONNHAMMER E., STADEN R., SULSTON J,, 

RA THIERRY-MIEG J., THOMAS K. , VAUDIN M. , VAUGHAN K, , WATERSTON R., 

RA WATSON A., WEINSTOCK L,, WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans."; 

RL NATURE 368:32-38(1994). 

DR EMBL; Z79598; E1344562; -. 

SQ SEQUENCE 738 AA; 81830 MW; 868D9B70 CRC32; 

Query Match 25.34; Score 246; DB 5; Length 738; 

Best Local Similarity 29.0*; Pred. No. 7.65e-19; 

Matches 38; Conservative 41; Mismatches 52; Indels 0; Gaps 0; 

Db 151 IQTINLGHNNMTAVPSSAIRGLKQLQSLHLHKNRIEQLDALNFLNLPVLNLLNLAGNQIH 210 

: I I :::: :|:: ||:|: |:|::| :: : III : I |:|: |||: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 211 ELNRQAFLNVPSLRYLYLSGNKITKLTAYQFQTFEQLEMLDLTNNEIGAIPANSLSGLKQ 270 
* : I II I I I h : !::: :||:| I ||:| :; :: : 

■ 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 271 LRQLYLAHNKI 281 

II : I I:: 

Qy 122 LRTFRLHSNNL 132 



RESULT 13 

ID Q24250 PRELIMINARY; PRT; 733 AA. 

AC Q24250; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE TARTAN PROTEIN PRECURSOR. 

GN TRN. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY) , 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-CANTON S; 

RX MEDLINE; 94074761. 

RA CHANG I., PRICE B,D, , BOCKHEIM S., BOEDIGHEIMER M.J., SMITH R., 

RA LAUGHON A. ; 

RT "Molecular and genetic characterization of the Drosophila tartan 



RT gene,"; 

RL DEV. BIOL. 160:315-332(1993). 

DR EMBL; U02078; G408375; -. 

DR FLYBASE; FBgn0010452; trn. 

DR PFAM; PF00560; LRR; 6. 

KW SIGNAL, 

FT SIGNAL 1 24 POTENTIAL, 

SQ SEQUENCE 733 AA; 81319 MW; DA426BB0 CRC32; 

Query Match 25,2%; Score 245; DB 5; Length 733; 

Best Local Similarity 34.8%; Pred. No, 1.06e-18; 

Matches 46; Conservative 31; Mismatches 54; Indels 1; Gaps 1; 

Db 201 LAELFLGMNTLQSIQAGAFQDLKGLTRLELKGASLRNISHDSFLGLQELRILDLSDNRLD 260 

I II I : :|: lllllll I II |: :|: :: III I ||||:|::: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 261 RIPSVGLSKLVRLEQLSLGQNDFEVISEGAFMGLKQLKRLEVNGALRLKRVMTGAFSDNG 320 

II I : I I I:: I :||| :|::| I :| : h ::|: 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNN-NITRLSVASFNHMP 120 

Db 321 NLEYLNLSSNKM 332 

:| : I II:: 
Qy 121 KLRTFRLHSNNL 132 



RESULT 14 

ID 070211 PRELIMINARY; PRT; 603 AA. 

AC 070211; 

DT 01-AUG-1998 (TREMBLREL, 07, CREATED) 

DT 01-AUG-1998 (TREMBLREL, 07, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE INSULIN- LIKE GROWTH FACTOR BINDING PROTEIN COMPLEX ACID-LABILE 

DE SUBUNIT, 

OS RATTUS NORVEGICUS (RAT) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN (1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-FISHER; 

RX MEDLINE; 98121980. 

RA DELHANTY P.J., BAXTER R.C.; 

RT "Cloning and characterization of the rat gene for the acid-labile 

RT subunit of the insulin-like growth factor binding protein complex."; 

RL J. MOL. ENDOCRINOL . 19:267-277(1997), 

DR EMBL; AF006203; G3093474; 

SQ SEQUENCE 603 AA; 66924 MW; 74E63165 CRC32; 

Query Match 24.8%; Score 241; DB 11; Length 603; 

Best Local Similarity 30.5%; Pred. No, 3,97e-18; 

Matches 40; Conservative 32; Mismatches 59; Indels 0; Gaps 0; 

Db 220 LRELDLSRNALRSVKANVFVHLPRLQKLYLDRNLITAYAPRAFLGMKALRWLDLSHNRVA 279 

II 1:1 I : :: II M hll : : III I :||ll I:: 
Qy 2 LRVLQLMENRISTIERGAFQDLKELERLRLNRNNLQLFPELLFLGTARLYRLDLSENQIQ 61 

Db 280 GLMEDTFPGLLGLHVLRLAHNAIASLRPRTFKDLHFLEELQLGHNRIRQLGERTFEGLGQ 339 

" :M : : hi I |: : :|: |: II I I :| I :|: :|: : 
Qy 62 AIPRKAFRGAVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPK 121 

Db 340 LEVLTLNDNQI 350 

I : I: I : 
Qy 122 LRTFRLHSNNL 132 



RESULT 15 

ID Q22187 PRELIMINARY; PRT; 683 AA. 

AC Q22187; 

DT Ql-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-JAN-1999 (TREMBLREL. 09, LAST ANNOTATION UPDATE) 

DE T05A1.3 PROTEIN. 

GN T05A1.3. 
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OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDHIA; RHABDIIIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RA LLOYD C; 

RL SUBMITTED (DEC*1995) TO EMBL/G ENBANK/DDB J DATA BANKS , 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R. , ANDERSON K,, BAYNES C, BERKS M. , 

RA BONFIELD J,, BURTON J., CONNELL M,, COPSEY T. , COOPER J,, COULSON A., 

RA CRAXTON M., DEAR S., DC Z., DURBIN R., FAVELLO A,, FULTON L., 

RA GARDNER A. , GREEN P., HAWKINS T., HILLIER L., JIER M., JOHNSTON I,, 

RA JONES M. , KERSHAW J., KIRSTEN J., LAISTER N., LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M, , 

RA PARSONS J., PERCY C, RIFKEN L, , ROOPRA A., SAUNDERS D., SHOWNKEEN R. ( 

RA SMALDON N., SMITH A., SONNHAMMER E., ST ADEN R., SULSTON J., 

♦THIERRY-MIEG J., THOMAS L, VAUDIN M,, VAUGHAN K., WATERSTON R. ( 
WATSON A, , WEINSTOCK L., WILKINSQN-SPROAT J., WOHLDMAN P.; 
"2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 
elegans."; 

RL NATURE 368:32-38(1994). 

DR EMBL; Z68219; E1349116; -. 

SQ SEQUENCE 683 AA; 77437 MW; B9B00EA2 CRC32; 

Query Match 24.6%; Score 239; DB 5; Length 683; 

Best Local Similarity 29.5%; Pred. No. 7.66e-18; 

Matches 36; Conservative 42; Mismatches 42; Indels 2; Gaps 2; 

Db 81 KVSTPPHSLFQGFRNLDRLELDRCLIDTVPEGLFAGLGQLYSLIVKNAKITDFPREIFAH 140 



Db 141 VPNLMTLDLSGNRLR-IEPYSLRSLQNLIHLDVSDNDIGFLT-NTLISLTKLKVITMNNN 198 



Qy 



11 




Qy 



71 



::: \-\ \-- II I :::|:| |: :: : ||: : ::;| 

AVDIKNLQLDYNQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKLRTFRLHSN 130 



Db 



199 



KI 200 



Qy 



131 



NL 132 



Search completed: Fri May 28 09:32:09 1999 
Job time : 52 sees. 
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Release 3 . 1A John F. coll ins / Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

irch_pp protein - protein database search, using Smith-Waterman algorithm 



Run on: Fri May .28 09:34:00 1999; Mas Par time 6 

331.06? Million cell updates/sec 

Tabular output not generated, 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



MJS-09-191-647-13 

(1-104) from OS09191647.pep 

807 

1 NNDDCVGHKCRHGAQCVDEV ITVNFVGKDSYVELASAKVR 104 

PAM 150 
Gap 11 

170751 seqs, 21266608 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 



1: parti 2:part2 3:part3 4:part4 5:part5 6:part6 7:part7 
8:part8 9:part9 10:partl0 ll:partll 12:partl2 13:partl3 
14:partl4 15:partl5 16:partl6 17:partl7 18:partl8 
19;partl9 20:part20 21:part21 22:part22 23:part23 
24 :part24 25:part25 26:part26 27:part27 28:part28 
29:part29 30:part30 31:part31 32:part32 33:part33 
34:part34 35:part35 36:part36 37:part37 38:part38 
39:part39 

Mean 27.273; Variance 127.182; scale 0.214 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



Statistics: 



Result 
No. 



Query 



Score 


Match Length 


DB 


ID 


Description 


Pred. 


NO. 


436 


54,0 


1534 


30 


W46966 


Amino acid sequence o 


l.73e 


28 


343 


42.5 


1480 


5 


R25079 


Drosophila SLIT prote 


2,51e 


20 


209 


25.9 


1193 


19 


W05835 


Chick Serrate. 


7.17G 


09 


207 


25,7 


612 


28 


W39256 


Human partial mature 


1.05e 


08 


207 


25,7 


737 


28 


W39257 


Human membrane protei 


1.05e 


08 


205 


25.4 


1872 


36 


W68510 


Partial human Notch-3 


1.54e 


08 


205 


25.4 


2321 


36 


W49698 


Human Notch3 protein. 


1.54e 


08 


203 


25.2 


1055 


29 


W44298 


Human serrate 2 prote 


2.26e 


08 


203 


25.2 


1212 


29 


W44299 


Human serrate 2. 


2.26e 


08 


203 


25.2 


1257 


19 


W05834 


Human Serrate-2 (HJ2) 


2.26e 


08 


200 


24.8 


685 


37 


W80813 


Nucleotide sequence o 


4,01e 


08 


190 


23.5 


1404 


7 


R38304 


Sequence of a serrate 


2,69e 


07 


188 


23.3 


722 


21 


W11720 


, M-Delta-1 polypeptide 


3.93e 


07 


188 


23.3 


1036 


25 


W18351 


Proliferation and dif 


3,93e 


07 


188 


23.3 


1187 


25 


W18352 


Proliferation and dif 


3.93e 


07 


188 


23.3 


1218 


29 


W44301 


Human serrate 1. 


3,93e 


07 



17 


188 


23.3 


1218 19 


W05833 


Human Serrate-1 (HJ1) 


3.93e 


07 


18 


188 


23.3 


1218 25 


W18354 


Proliferation and dif 


3.93e 


07 


19 


186 


23,0 


520 25 


W18348 


Proliferation and dif 


5.73e 


07 


20 


186 


23.0 


702 25 


W18349 


Proliferation and dif 


5.73e 


07 


21 


186 


23,0 


723 25 


W18353 


Proliferation and dif 


5.73e 


07 


22 


184 


22,8 


385 10 


R56167 


Neuroendocrine tumor 


5.37e 


07 


23 


183 


22,7 


1208 28 


W40827 


Human Jagged protein. 


LOle 


06 


24 


178 


22.1 


727 21 


W11719 


C *Delta -1 polypeptide 


2,59e 


06 


25 


178 


22.1 


740 21 


W00876 


C -Delta -I polypeptide 


2,59e 


06 


26 


172 


21,3 


660 21 


W11725 


H*Delta"l polypeptide 


7.98e 


06 


27 


171 


21,2 


487 11 


R60518 


Cattle Factor -Xa. 


9.62e 


06 


28 


171 


21.2 


492 11 


R60502 


Serine protease for f 


9.62e 


06 


29 


171 


21.2 


833 6 


R28960 


Delta Dll. 


9,62e 


06 


30 


164 


20,3 


383 10 


R56166 


Neuroendocrine tumor 


3.55e 


05 


31 


157 


19.5 


157 21 


W11730 


H-Delta-1 polypeptide 


L,30e 


04 


32 


155 


19,2 


467 29 


W40283 


Human Factor X protea 


L87e 


04 


33 


152 


18.8 


415 7 


R35761 


Factor IX (IX). 


3,26e 


04 


34 


152 


18.8 


448 7 


R37402 


Factor X, 


3,26e 


04 


35 


152 


18.8 


448 35 


W66092 


Human factor X varian 


3,26e 


04 


36 


152 


18.8 


454 12 


R67710 


Human Factor- IX. 


3,26e 


04 


37 


152 


18.8 


461 3 


P50302 


Sequence of human fac 


3.26e 


04 


38 


152 


18.8 


461 29 


W40284 


Human Factor IX prote 


3,26e 


04 


39 


152 


18.8 


462 3 


R10868 


Recombinant human fac 


3.26e 


04 


40 


152 


18.8 


488 36 


W76218 


Human Factor X protei 


3.26e 


04 


41 


152 


18.8 


488 4 


R22512 


Mutated precursor of 


3.26e 


04 


42 


152 


18.8 


488 4 


R22511 


Human Factor Xai. 


3.26e 


04 


43 


152 


18.8 


488 36 


W76216 


Human Factor x protei 


3,26e 


04 


44 


152 


18.8 


488 36 


W76219 


Human Factor X protei 


3.26e 


04 


45 


152 


18.8 


488 36 


W76217 


Human Factor X protei 


3.26e 


04 



RESULT 
ID 
AC 



1 



W46966 standard; Protein; 1534 AA. 
W46966; 

06-JOL-1998 (first entry) 

DE Amino acid sequence of a human slit-like polypeptide. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

ft Peptide 1. .26 

FT /note- "signal peptide" 

FT Protein 27.. 1534 

FT /note- "mature protein" 

PN J10087699-A. 

PD 07-APR-1998, 

PF 15-JUL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-267127/24. 

DR N-PSDB; V16978. 

PT Human Slit-like protein - useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 31-35; 45pp; Japanese, 

CC The present sequence represents a novel human slit- like protein (the 

CC mature protein is claimed in Claim 1). The slit-like polypeptide is 

CC useful for diagnosis and treatment of brain-specific diseases and 

CC cancers. Antibodies directed against the protein, or its fragments 

CC can also be used for diagnosing cancer. 

SQ Sequence 1534 AA; 



Query Match 54.0%; Score 436; DB 30; Length 1534; 

Best Local Similarity 52.5%; Pred. No. 1.73e-28; 

Matches 52; Conservative 24; Mismatches 22; Indels 1; Gaps 1; 

Db 1083 nqddcrdhrcqngaqcmdevnsysclcaegysgqlceipphlpa-pkspcegtecqngan 1141 

I III :|::|:i|:'i:i:|:':|::|:l :|| || : |||: HUM 
3y 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 1142 cvdqgnrpvcqclpgfggpecekllsvnfvdrdtylqft 1180 
{ I: I 1:1 llhll llll::IHI :|:|:::: 
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Qy 61 CIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELA 99 



RESULT 2 

ID R25079 standard; Protein; 1480 AA. 

AC R25079; 

DT 05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 

KW midline glial cells; axonogenesis; cell-cell interaction; ss. 

OS Drosophila melanogaster. 

FH Key Location/Qualifiers 

FT peptide 1..36 

FT /label- signal 

FT domain 73,. 294 

FT /label- FlankJ,RR_Flank_l 

FT /note- "mediates adhesive events" 

FT domain 29 5 .,518 

FT /label- Flank -LRR-FlankJ 

FT /note- "mediates adhesive events" 

f domain 519.. 714 

/label- FlankjLRRjlankJ 
/note- "mediates adhesive events" 
domain 715.. 910 

FT /label- Flank_LRR_Flank_4 

FT /note- "mediates adhesive events" 

FT region 911.. 1150 

FT /label- Tandem_EGF_like_repeats 

FT /note- "involved in protein-protein interactions" 

FT region 1353.. 1393 

FT /label- 7th_EGF„like„repeat 

FT /note- "involved in receptor- ligand interactions" 

FT region 1394,. 1404 

ft /label- alternative_splice_segment 

FT /note- "developmental^ regulated" 

FT region 14 05,, 14 80 

FT /label- C-terminalj:egion 

PN WO9210518-A. 

PD 25-JUN-1992, 

PF 27-NOV-1991; U09055. 

PR 07-DEC-1990; OS-624135. 

PA (UYYA ) ONIV YALE, 

PI Artavanis-Tsakonas S, Rothberg JM; 

DR WPI; 92-234590/28, 

DR N-PSDB; Q25811. 

PT SLIT protein and sequence elements for treating 

PT neurodegenerative disease • useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 1; Page 84-89; 122pp; English, 

The SLIT protein is necessary for normal development of the midline 

■ of the CNS, partic. the midline glial cells, and for the 

W concomitant formation of the commisural axon pathways. The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse than. The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding. SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes -caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 

CC claimed as, are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein, 

CC See also R29102. 

SO Sequence 1480 AA; 



Query Match 42.5*; Score 343; DB 5; Length 1480; 

Best Local Similarity 42,2%; Pred, No. 2.51e-20; 

Matches 46; Conservative 26; Mismatches 31; Indels 6; Gaps 5; 

Db 1064 niddcqnhmcqnggtcvdgindyqcrcpddytgkyceghnmismmypqtspcqnheckhg 1123 

I III l:|::h III :| I I II: ::| :|| I :|: Mill: || :| 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTC IC PQGFSGLFCE - HPP - PMVLLQTS PCDQ YECQNG 58 

Db 1124 v-cfqpnaqgsdylcrchpgytgkwceyltsisfvhnnsfveleplrtr 1171 

I: I : III M::| :|l I "Ml "hill : : I 
Qy 59 AQCIWQ-QEPT--CRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 104 



RESULT 3 




ID 


W05835 standard; Protein; 1193 AA, 


AC 


W05835; 




DT 


28-JAN-1997 


(first entry) 


DE 


Chick Serrate, 


KW 


C-Serrate; Notch; cell differentiation; cell fate; tissue repair; 


KW 


central nervous system; cancer; therapy; diagnosis. 


OS 


Gallus sp. 




FH 


Key 


Location/Qualifiers 


FT 


domain 


1. .1041 


FT 




/label- Extracellular domain 


FT 


peptide 


1..5 


FT 




/label- Sig_peptide 

/note- "lacks the N-terminal portion owing to 
truncation of the encoding cDNA clone" 


FT 




FT 




FT 


domain 


158. .203 


FT 




/label- DSL 


FT 




/note- "region of homology with Drosophila Delta 


FT 




and Serrate, predicted to mediate binding 


FT 




with Notch" 


FT 


domain 


208., 837 


FT 




/label- ELR 


FT 




/note- "epidermal growth factor-like repeat domain 


FT 


region 


208,. 238 


FT 




/label- ELR1 


FT 


region 


239.. 274 


FT 




/label- ELR2 


FT 


region 


275. .313 


FT 




/label- ELR3 


FT 


region 


314.. 351 


FT 




/label- ELR4 


FT 


region 


352. ,390 


FT 




/label- ELR5 


FT 


region 


391., 427 






/label- ELR6 


FT 


region 


428. .464 


FT 




/label- ELR7 


FT 


region 


465.. 502 


FT 




/label- ELR8 


FT 


region 


503.. 540 


FT 




/label- ELR9 


FT 


region 


541. .606 


FT 




/label- ELR10 


FT 


region 


607.. 644 


FT 




/label- ELR11 


FT 


region 


655. .682 


FT 




/label- ELR12 


FT 


region 


683. .721 


FT 




/label- ELR13 


FT 


region 


722. ,759 


FT 




/label- ELR14 


FT 


region 


760.. 797 


FT 




/label- ELR15 


FT 


region 


798. .837 


FT 




/label- ELR16 


FT 


region 


854.. 911 


FT 




/label- Cysteine- richjregion 


FT 


domain 


1042.. 1066. 


FT 




/label- Transmembranejomain 
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FT domain 1067.. 1193 

FT /label- Intracellular domain 

PN WO9627610-A1. 

PD 12-SEP-1996. 

PF 07-MAR-1996; U03172 . 

PR 07-MAR-1995; US-400159. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 

PI Lewis JH, Mann RS, Myat AM; 

DR WPI; 96-425379/42. 

DR N-PSDB; T40092. 

PT vertebrate Serrate protein and related DNA ■ used to treat or 

PT prevent malignancies characterised by increased Notch activity. 

PS Disclosure; Page 112-115; 161pp; English. 

f£ Chicken Serrate (W05835), or C-Serrate, is a ligand for the zygotic 

■ neurogenic locus Notch and is believed to play a major role in 

V determining cell fates in the central nervous system, Its amino 

CC acid sequence was deduced from a cDNA clone (T40092) obtd. from an 

CC optic explant cDNA library. C-Serrate is expressed in the central 

CC nervous system, cranial placodes, nephric mesoderm, vascular 

CC system, and limb bud mesenchyme. 

SQ Sequence 1193 AA; 

Query Match 25.9*; Score 209; DB 19; Length 1193; 

Best Local Similarity 37,8%; Pred. No. 7.17e-09; 

Matches 31; Conservative 18; Mismatches 26; Indels 7; Gaps 2 

Db 463 necasnpcmngghcqdeingfqclcpagfsgnlc-q ldidycepnpcqngaqcf 515 

"I =: I Ihll: Ml llll :l : I: 1= lllllll: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 516 nlamdyfcncpedyegkncshl 537 

: : I II : I I I 
Qy 63 WQQEPTCRCPPGFAGPRCEKL 84 



RESULT 4 

ID W39256 standard; protein; 612 AA. 

AC W39256; 

DT 19-MAY-1998 (first entry) 

DE Human partial mature membrane protein. 

KW Epidermal growth factor motif; EGF motif; membrane protein; disease; 

C brain; nervous tissue; cancer. 
Homo sapiens. 
Key Location/Qualifiers 

FT Protein 1. .612 

FT /note- "partial mature protein" 

PN J10036395-A. 

PD 10-FEB-1998, 

PF 24-JUL-1996; 194467. 

PR 24-JUL-1996; JP-194467. 

PA (AS AH ) ASAHI KASEI KOGYO KK, 

DR WPI; 98-174912/16. 

PT New human membrane protein • specifically expressed in brain and 

PT nervous tissue; used in diagnosis of diseases specific to these 

PT tissues and cancer 

PS Claim 1; Column 18-19; 26pp; Japanese. 

CC W39256 represents the partial mature amino acid sequence of a novel 

CC membrane protein which contains epidermal growth factor (EGF) motifs. 

CC The new membrane protein is expressed specifically in brain and nervous 

CC tissue. The protein and DNA can be used in the diagnosis of brain and 

CC nerve system specific diseases and cancer. 

SQ Sequence 612 AA; 

Query Match 25.7%; Score 207; DB 28; Length 612; 

Best Local Similarity 42.0%; Pred. No. 1.05e-08; 

Matches 34; Conservative 13; Mismatches 27; Indels 7; Gaps 3; 

Db 370 cildpcrngatcisslsgf tcqcpegyf gsacee- - -kv- -d—pcasspcqnngtcyvd 422 

I: Ihll I: : : I : I! I|:|: III :| : II III : I I 
Qy 5 CVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCIW 64 



Db 423 gvhftcncspgftgptcaqli 443 

II I = I f I : II I II 
Qy 65 QQEPTCRCPPGFAGPRCEKLI 85 



RESULT 5 

ID W39257 standard; protein; 737 AA. 

AC W39257; 

DT 19-MAY-1998 (first entry) 

DE Human membrane protein. 

KW Epidermal growth factor motif; EGF motif; membrane protein; disease; 

KW brain; nervous tissue; cancer; disease. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide 1..26 

FT /label- signal 

FT Protein 27.. 7 37 

FT /label= membrane_protein 

PN J10036395-A. 

PD 10-FEB-1998. 

PF 24-JUL-1996; 194467. 

PR 24-JUL-1996; JP-194467. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-174912/16. 

DR N-PSDB; V09641. 

PT New human membrane protein - specifically expressed in brain and 

PT nervous tissue; used in diagnosis of diseases specific to these 

PT tissues and cancer 

PS Claim 2; Pages 19-21; 26pp; Japanese. 

CC W39257 represents the amino acid sequence of a novel membrane protein 

CC which contains epidermal growth factor (EGF) motifs. The new membrane 

CC protein is expressed specifically in brain and nervous tissue. The 

CC protein and DNA can be used in the diagnosis of brain and nerve system 

CC specific diseases and cancer. 

SQ Sequence 737 AA; 

Query Match 25.7%; Score 207; DB 28; Length 737; 

Best Local Similarity 42.0%; Pred. No. 1.05e-08; 

Matches 34; Conservative 13; Mismatches 27; Indels 7; Gaps 3; 

Db 396 cildpcrngatcisslsgftcqcpegyfgsacee---kv--d--pcasspcqnngtcyvd 448 

I: Ihll I: ::|:M Ihh I II :| : II III : I I 

Qy 5 CVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCIW 64 

Db 449 gvhftcncspgftgptcaqli 469 

II hllhll I II 

Qy 65 QQEPTCRCPPGFAGPRCEKLI 85 



ID W68510 standard; Protein; 1872 AA. 

AC W68510; 

DT 06 -JAN- 1999 (first entry) 

DE Partial human Notch- 3 protein, 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Misc.difference 328 

FT /note- "encoded by NAN" 

FT Misc_difference 401 
■ FT /note- "encoded by GNN" 

' FT Miscjlf ference 403 

FT /note- "encoded by GNC" 

FT Misc.difference 406 

FT /note- "encoded by GNN" 

FT Misc.difference 409 

FT /note- "encoded by NNT" 

FT Misc.difference 420 

FT /note- "encoded by GNC" 

FT Misc.difference 706 
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FT 


/note- 


"encoded 


bv 


NNN 


FT 


Miscjifference 708 






FT 


/note» 


"encoded 


by 


CCN 


FT 


Miscjifference 719 








FT 


/note- 


"encoded 


by 


CGN 


FT 


Miscjifference 728 








FT 


/note- 


"encoded 


by 


CNT 


FT 


Miscjifference 729 






FT 


/note- 


"encoded 


by 


GTN 


FT 


Miscjifference 759 .,789 




FT 


/note= 


"encoded 


by 


NNN 


FT 


Miscjifference 1425 






FT 
PN 


/note- 

FR2751985-A1, 


"encoded 


by 


GNA 


PD 


06-FEB-1998. 








PF 


01-AUG-1996; 009733. 








PR 


01-AUG-1996; FR-009733 









DR 



PA (INRM ) INSERM INST NAT SANTE S RECH MEDICALE . 
PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 
DR WPI; 98-133137/13. 
N-PSDB; V57163. 

Human Notch3 nucleic acids - and methods for identifying 
pre-disposition to cerebral autosomal dominant arteriopathy with 
sub-cortical infarcts and leukoencephalopathy 
Claim 2; Fig la-lg; 42pp; French, 
CC This sequence represents a partial human notch3 protein, a transmembrane 
CC receptor protein involved in lateral inhibition and regulating 
CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 
CC are thought to be involved in neurological disorders, especially of the 
CC cerebral autosomal dominant arteriopathy with subcortical infarcts and 
CC leukoencephalopathy (CADASIL) type. Blocking expression of a mutated 
CC Notch3 gene or by substitution therapy with non -mutated Notch3 gene or 
CC protein can be used to treat CADASIL or related disorders . 
SQ Sequence 1872 AA; 

Query Match 25,4%; Score 205; DB 36; Length 1872; 

Best Local Similarity 38.6%; Pred, No. 1.54e-08; 

Matches 32; Conservative 20; Mismatches 23; Indels 8; Gaps 5; 

Db 443 decastpcrngakcvdqpdgyecrcaegfegtlcdrn---vd-dcsp-dp--chhg-rcv 494 

1:1 : Ihll III: :|l I |::|l I :|:: I : II I |::| :|: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 495 dgiasfscacapgytgtrcesqv 517 

:| hl|::| III : 
Qy 63 WQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 7 

ID W49698 standard; Protein; 2321 AA. 

A W49698; 

■ 21-DEC-1998 (first entry) 

to Human Notch3 protein. 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy; therapy, 

OS Homo sapiens, 

PN FR2751986-A1, 

PD 06-FEB-1998. 

PF 16-APR-1997; 004680. 

PR 01-AUG-1996; FR-009733. 

PA (INRM ) INSERM INST NAT SANTE fi RECH MEDICALE. 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133138/13. 

DR N-PSDB; V57001. 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig 1.1-1.8; 45pp; French. 

CC This sequence represents the human Notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 



CC are thought to be involved in neurological disorders, especially of 

CC the cerebral autosomal dominant arteriopathy with subcortical infarcts 

CC and leukoencephalopathy (CADASIL) type. Blocking expression of a 

CC mutated Notch3 gene or by substitution therapy with non -mutated Notch3 

CC gene or protein can be used to treat CADASIL or related disorders . . 

SQ Sequence 2321 AA; 

Query Match 25,4%; Score 205; DB 36; Length 2321; 

Best Local Similarity 38.6%; Pred. No. 1.54e-08; 

Matches 32; Conservative 20; Mismatches 23; Indels 8; Gaps 5; 

Db 509 decastpcrngakcvdqpdgyecrcaegfegtlcdrn---vd-dcsp-dp--chhg-rcv 560 

1:1 : Ihll III: :M I |::|| I :|:: I : II I |::| :|: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 561 dgiasfscacapgytgtrcesqv 583 

: |:||::| I : 
Qy 63 WQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 8 

ID W44298 standard; Protein; 1055 AA, 

AC W44298; 

DT 19-JUN-1998 (first entry) 

DE Human serrate 2 protein fragment. 

KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour. 

OS Homo sapiens. 

PN WO9802458-A1, 

PD 22-JAN-1998, 

PF ll-JUL-1997; J02414. 

PR 14-MAY-1997; JP-124063, 

PR 16-JUL-1996; JP-186220. 

PA (AS AH ) ASAHI KASEI KOGYO KK, 

PI Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15181. 

PT Human serrate-2 gene expression products - used to regulate stem 

PT cell differentiation, useful in treating neoplasms, e.g. leukaemia 

PS Claim 2; Page 57-62; 103pp; Japanese, 

CC The present sequence represents a human serrate 2 protein fragment, The 

CC present invention also describes a method for the preparation of the 

CC polypeptides, and antibodies binding to the polypeptide and its 

CC fragments, The polypeptide and its fragments expressed by the serrate-2 

CC' gene can be used to inhibit stem (especially blood stem) cell 

CC differentiation and to inhibit endothelial cell growth. They may be 

CC incorporated in a cell culture media for culturing undifferentiated 

CC stem cells. They can also be used for treatment of neoplasms such as 

CC leukaemia, The antibodies can be used for the diagnosis of malignant 

CC tumours . 

SQ Sequence 1055 AA; 

Query Match 25.2%; Score 203; DB 29; Length 1055; 

Best Local Similarity 42,7%; Pred. No. 2.26e-08; 

Matches 35; Conservative 15; Mismatches 24; Indels 8; Gaps 2; 

Db 435 nvndcrgq-cqhggtckdlvngyqcvcprgfggrhce lerdkcasspchsggl 486 

I :N I: I:lh I I I II I hlhlhl II |: I |:;|: 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 487 cedladgfhchcpqgfsgplce 508 

I : : hll Ihll It 
Qy 61 CIWQQEPTCRCPPGFAGPRCE 82 



RESULT 9 

ID W44299 standard; Protein; 1212 AA, 

AC W44299; 

DT 19-JUN-1998 (first entry) 

DE Human serrate 2, 

KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour. 

OS Homo sapiens. 
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PN HO9802458-A1. 

PD 22-JAN-1998, 

PF ll-JUL-1997; J02414. 

PR 14-MAY-1997; JP-124063. 

PR 16-JUL-1996; JP-186220. 

PA (AS AH ) AS AH I KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15181, 

PT Human serrate-2 gene expression products - used to regulate stem 

PT cell differentiation, useful in treating neoplasms, e.g. leukaemia 

PS Claim 3; Page 62-68; 103pp; Japanese. 

CC The present sequence represents human serrate 2. The present invention 

CC also describes a method for the preparation of the polypeptides, and 

CC antibodies binding to the polypeptide and its fragments. The polypeptide 

•and its fragments expressed by the serrate-2 gene can be used to inhibit 
stem (especially blood stem) cell differentiation and to inhibit 
endothelial cell growth, They may be incorporated in a cell culture 

CC media for culturing undifferentiated stem cells. They can also be used 

CC for treatment of neoplasms such as leukaemia. The antibodies can be used 

CC for the diagnosis of malignant tumours, 

SQ Sequence 1212 AA; ■ 

Query Match 25.2%; Score 203; DB 29; Length 1212; 

Best Local Similarity 42.7%; Pred. No. 2,26e-08; 

Matches 35; Conservative 15; Mismatches 24; Indels 8; Gaps 2; 

Db 435 nvndcrgq-cqhggtckdlvngyqcvcprgfggrhce lerdkcasspchsggl 486 

I :ll I: Ml: I I INI |:||:||:| II |: | |::|: 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 487 cedladgfhchcpqgfsgplce 508 

I : : 1:11 11:11 II 
Qy 61 CIWQQEPTCRCPPGFAGPRCE 82 



ID 
AC 
DT 
DE 
KW 
KW 

FT 
FT 
FT 
FT 



W05834 standard; Protein; 1257 AA. 
W05834; 

28-JAN-1997 (first entry) 
Human Serrate-2 (HJ2), 

Serrate-2; human jagged-2; HJ2; Notch; cell differentiation; 
cell fate; central nervous system; cancer; tissue repair; therapy; 
diagnosis; antibody. 
Homo sapiens. 

Key Location/Qualifiers 

domain 1. .912 

/label= Extracellularjomain 

/note- "a deletion in the encoding cDJJA clone 

results in loss of part of the Serrate-2 

signal peptide and beginning of the DSL 

domain 

domain 26.. 70 

/label- DSL 

/note» "region of homology with Drosophila Delta 
and Serrate, predicted to mediate binding 
with Notch" 
domain 75.. 735 

/label- ELR 

/note- "epidermal growth factor-like repeat domain" 
' n 75. .105 

/label- ELR1 
region 106.. 140 

/label- ELR2 
region 141.. 180 

/label- ELR3 
region 181.. 218 

/label- ELR4 
n 219.. 256 

/label- ELR5 
■egion 257.. 294 

/label- ELR6 
n 295. .331 





/label- ELR7 


ion 


332. .369 




/label- ELR8 


region 


370, ,407 




/label- ELR9 


region 


408,. 435 




/label- Partial ELR 


region 


436.. 469 




/label- Partial ELR 


region 


470.. 507 




/label- ELR10 


region 


508.. 545 




/label- ELRll 


region 


546,. 584 




/label- ELR12 


region 


585.. 622 




/label- ELR13 


region 


623,. 660 




/label- ELR14 


region 


664.. 701 




/label- ELR15 


region 


702.. 718 ■ 




/label- Partial ELR 


region 


719.. 735 




/label- PartialjLR 


domain 


913.. 933 




/label- Transmembrane domain 


domain 


934.. 1257 




/label- Intracellular.domain 


WO9627610-A1. 




12-SEP-1996. 





07-MAR-1996; U03172. 
07-MAR-1995; US-400159. 
(IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 
(UYYA ) ONIV YALE. 

Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 
Lewis JH, Mann RS, Myat AM; 
WPI; 96-425379/42. 
N-PSDB; W05834. 

Vertebrate Serrate protein and related DNA * used to treat or 
prevent malignancies characterised by increased Notch activity, 
Claim 5; Page 104-107; 161pp; English. 
Human Serrate-1 (W05833) and human Serrate-2 (W05833) are ligands 
for the zygotic neurogenic locus Notch, and are believed to play a 
major role in determining cell fates (differentiation) in the 
central nervous system. Their amino acid sequences were deduced 
from cDNA clones (see also T40090-91) isolated from human foetal 
brain cDNA libraries. The proteins, antibodies raised to them, 
and encoding nucleic acids can be used in the detection of 
Serrate sequences and in the treatment of disorders of cell fate 
or differentiation, partic. cancer, nervous system disorders 
and in tissue repair or regeneration. 
SQ Sequence 1257 AA; 

Query Match 25.2%; Score 203; DB 19; Length 1257; 

Best Local Similarity 42.7%; Pred, No. 2.26e-08; 

Matches 35; Conservative 15; Mismatches 24; Indels 8; Gaps 2; 

Db 291 nvndcrgq-cqhggtckdlvngyqcvcprgfggrhce lerdkcasspchsggl 342 

I :!l I: Ml: I I llll MM: || : | 

Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 343 cedladgfhchcpqgfsgplce 364 

I : : M ||;|| II 
Qy 61 CIWQQEPTCRCPPGFAGPRCE 82 



RESULT 11 

ID W80813 standard; Protein; 685 AA, 
AC W80813; 

DT 16-FEB-1999 (first entry) 

DE Nucleotide sequence of the human Delta 3 protein . 

KW Human; Delta 3 protein; agonist; tissue regeneration; 
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KW neurodegenerative disease; neurodifferentiative disorder; 

KW neurodevelopmental disorder; peripheral neuropathy; 

KW spinocerebella degeneration; antagonist; neoplastic disease; 

KW hyperplastic disease; cancer; Waldenstroem's macroglobulemia; 

KW fibroproliferative disorder; cerebravascular tissue; gene therapy; 

KW antibody. 

OS Homo sapiens. 

PN W09845434-A1. 

PD 15-OCM998. 

PF 06-APR-1998; 006775. 

PR U-JUN-1997; US-872855 . 

PR 04-APR-1997; US-832633. 

PA (MILL-) MILLENNIUM BIOTHERAPEUTICS INC. 

PI Gearing DP, McCarthy SA; 

DR WPI; 98-594482/50. 

DR N-PSDB; V68523. 

PT New isolated human Delta3 gene - used to develop products for 

PT treating, e.g. nerve injury, neurodegenerative disorders, peripheral 

PT neuropathies and spinocerebella degenerations 

PS Claim 2; Fig 1; 160pp; English. 

CC This is the amino acid sequence of the human Delta 3 protein 

tused in the method of the invention. The Delta3 gene is involved in 
the growth and differentiation of cells. Delta3 agonists can be used 
for promoting the tissue regeneration or repair needed to treat a 
nerve injury, neurodegenerative disease, neurodifferentiative or 

CC neurodevelopmental disorders including peripheral neuropathies and 

CC spinocerebella degenerations. Delta3 antagonists can be used to treat 

CC neoplastic or hyperplastic diseases, e.g. cancers, Waldenstroem's 

CC macroglobulemia and fibroproliferative disorders, particularly 'of 

CC cerebravascular tissue, The nucleic acids can also be used for gene 

CC therapy. The products can also be used for antibody production, 

CC detection, diagnosis and drug screening. 

SO Sequence 685 AA; 

Query Match 24.8%; Score 200; DB 37; Length 685; 

Best Local Similarity 41.54; Pred. No. 4.01e-08; 

Matches 34; Conservative 16; Mismatches 23; Indels 9; Gaps 4; 

Db 327 ecdsnpcrnggsckdqedgyhclcppgyyglhcehst----l--s-cadspcfnggscre 379 

M :: ||:|: ! I: :|| hi! |: || |||: I | | ; II: 
Qy 4 DCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCIV 63 

Db 380 rnqganyacecppnftgsncek 401 

I : :| III hi: III 
Qy 64 VQQEP- -TCRCPPGFAGPRCEK 83 



RESULT 12 

ID R38304 standard; Protein; 1404 AA. 

AC R38304; 

•30-NOV-1993 (first entry) 
Sequence of a serrate protein. 
Serrate; toporythiic protein; family. 

OS Drosophila melanogaster , 

PN W09312141-A. 

PD 24-JUN-1993, 

PF ll-DEC-1991; U09240. 

PR ll-DEC-1991; WO-U09240. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-tsakonas S, Fleming RJ; 

DR WPI; 93-214095/26. 

DR N-PSDB; Q43910. 

PT Purified serrate protein, nucleic acid and antibodies - used in 

PT the study and manipulation of differentiation and other 

PT physiological processes 

PS Claim 4; Pages 74-80; 119pp; English. 

CC Two Drosophila ganomic phage libraries were screened and recombinant 

CC clones were isolated. The cDNAs in lambda gtlO were isolated from an 

CC early pupal library. The CI cDNA was isolated from an early pupal 

CC library. Subsequently the C3 cDNA was isolated using the 5' 700 bp 

CC terminal fragment of the CI cDNA as probe. The complete 5561bp 

CC sequence of DNA of the Drosophila Serrate protein was derived from 

CC CI and C3 cDNAs (Q43910). The deduced protein product appears to be 



CC a transmembrane protein. AAs 51-80 represent the likely signal 

CC peptide; aas 542-564 represent potential membrane associated region; 

CC aas 1221-1245 represent the putative transmembrane domain. 

SQ Sequence 1404 AA; 

Query Match 23.5*; Score 190; DB 7; Length 1404; 

Best Local Similarity 41.3%; Pred. No. 2,69e-07; 

Matches 33; Conservative 15; Mismatches 24; Indels 8; Gaps 6; 

Db 609 ddcvgq-crngatcidlvndyrcacasgftgrdce--td-i--d-e-catspcrnggecv 660 

llllh 11:11 1:1 II I I I: 1 1 : 1 II : : I |:||::|: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 661 dmvgkfncicplgysgslce 680 
: Ml l::h I! ' 
Qy 63 WQQEPTCRCPPGFAGPRCE 82 



RESULT 13 

ID W11720 standard; Protein; 722 AA. 

AC W11720; 

DT 28-APR-1997 (first entry) 

DE M-Delta-1 polypeptide. 

KW M-Delta-1; cell proliferation; nervous system disorder; 

KW tissue regeneration; Notch; cervix cancer; breast cancer; 

KW lung cancer; colon cancer; melanoma; seminoma; 

KW neurogenesis; therapy. 

OS Mus sp. 

PN WO9701571-A1. 

PD 16-JAN-1997. 

PF 28-JUN-1996; 011178. 

PR 28-JUN-1995; US-000589. 

PA (IMCR ) IMPERIAL CANCER RES TECHNOLOGY. 

PA (UYYA ) UNIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique D, Ish-Horowicz D; 

PI Lewis J; 

DR WPI; 97-100159/09. 

DR N-PSDB; T58899. 

PT New vertebrate Delta protein, DNA and antibodies • for treating and 

PT preventing cancer, nervous system disorders and for tissue 

PT regeneration 

PS Claim 4; Fig 8; 135pp; English. 

CC M-delta-1 polypeptide (W11720) is the mouse homologue of Drosophila 

CC Delta, a protein that binds to Notch protein. It is expressed 

CC primarily in presomitic mesoderm, the central and peripheral 

CC nervous systems, and kidney, Chick (W11719) and human (W11721- 

CC 38) Delta-1 polypeptides have also been identified, Delta-1 

CC proteins can be used to treat or prevent disorders characterised by 

CC increased Notch activity, such as cervical, breast, lung or colon 

CC cancer, melanoma or seminoma, as well as nervous system disorders, 

CC and to promote tissue regeneration and repair, 

SQ Sequence 722 AA; 

Query Match 23.3%; Score 188; DB 21; Length 722; 

Best Local Similarity 38.8%; Pred. No. 3.93e-07; 

Matches 33; Conservative 17; Mismatches 28; Indels 7; Gaps 3; 

Db 442 nvddcasspcanggtcrdsvndfsctcppgytgknc--sap-v----srcehapchngat 494 

I III : I :h I I II ::| II l::l I ::| I I |:: |:||| 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 495 chqrgqrymcecaqgyggpncqfll 519 

I III: h:ll I: h 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 14 

ID W18351 standard; protein; 1036 AA. 

AC W18351; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 
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KW 
OS 
PN 
PD 
PF 
PR 



immunosuppression. 
Homo sapiens. 



Qy 



: Ml -III 
63 WQQEPTCRCPPGFAGPRCEKL 84 



W09719172-A1. 

29- MAY-1997. 
15-NOV-1996; J03356. 

30- NOV-1995; JP-311811. 



Search completed: Fri May 28 09:34:58 1999 
Job time : 58 sees. 



PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO RK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 5; Page 66-71; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

•proliferation and differentiation of undifferentiated cells such 
as neurons and blood cells. -The polypeptide may be used for the 
prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 1036 AA; 

Query Match 23.3%; Score 188; DB 25; Length 1036; 

Best Local Similarity 36.6%; Pred. No. 3.93e-07; 

Matches 30; Conservative 17; Mismatches 28; Indels 7; Gaps 2; 

Db ' 458 decasnpclngghcqneinrfqclcptgfsgnlc-q ldidycepnpcqngaqcy 510 



Db 511 nrasdyfckcpedyegkncshl 532 

: 1 : 1 1 : II I 
Qy 63 WQQEPTCRCPPGFAGPRCEKL 84 



RESULT 15 

ID W18352 standard; protein; 1187 AA. 

AC W18352; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

§Homo sapiens. 
W09719172-A1. 
29-MAM997. 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 6; Page 71-76; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells . The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 1187 AA; 

Query Match 23.3%; Score 188; DB 25; Length 1187; 

Best Local Similarity 36.6%; Pred. No, 3.93e-07; 

Matches 30; Conservative 17; Mismatches 28; Indels 7; Gaps 2; 

Db 458 decasnpclngghcqneinrfqclcptgfsgnlc-q ldidycepnpcqngaqcy 510 

1:1 :: I :|::| :|:| : hll Mil :| : I: I: lllllll 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 




Db 511 nrasdyfckcpedyegkncshl 532 
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Release 3.1A John F. Collins, Biocomputing Research Unit, 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

|srch_pp protein ■ protein database search, using Smith-Waterman algorithm 

in on: Fri May 2 



Tabular output not generated 



09:35:16 1999; MasPar time 7.12 Seconds 

585.478 Million cell updates/sec 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



>US-09-191-647-13 

(1-104) from US09191647 .pep 

807 

1 NNDDCVGHKCRHGAQCVDEV ITVNFVGKDSYVELASAKVR 104 

PAM 150 
Gap 11 

122810 seqs, 40068593 residues 



Post-processing: Minimum Match 01 

Listing first 45 summaries 

Database: pir60 

l:pirl 2;pir2 3:pir3 4:pir4 

Statistics: Mean 36.296; Variance 59.185; scale 0,613 

Pred. No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



Query 



NO, 


Score 


Match Length 


DB 


ID 


Description 


Pred. No. 


1 


343 


42.5 


530 


2 


A31640 


epidermal growth fact 


1.29e-57 


2 


343 


42.5 


1469 


2 


B36665 


slit protein 2 precur 


1.29e-57 


3 


343 


42.5 


1480 


2 


A36665 


slit protein 1 precur 


1.29e-57 


4 


215 


26.6 


2555 


2 


A40043 


notch protein homolog 


8.30e-28 


5 


214 


26.5 


1064 


2 


A40136 


fibropellin la - sea 


1.39e-27 


6 


209 


25,9 


2703 


2 


A24420 


notch protein ■ fruit 


1.81e-26 


7 


205 


25.4 


293 


2 


B26637 


neurogenic repetitive 


1.40e-25 


8 


205 


25,4 


2139 


2 


A35672 


crumbs protein ■ frui 


1.40e-25 


9 


205 


25.4 


2321 


2 


S78549 


notch3 protein • huma 


1.40e-25 


10 


205 


25.4 


2524 


2 


A35844 


Xotch protein - Afric 


1.40e-25 


11 


202 


25.0 


2318 


2 


S45306 


notch 3 protein - mou 


6.47e-25 


12 


198 


24.5 


861 


2 


A48825 


Notch homolog Motch p 


4.94e-24 


13 


198 


24.5 


2531 


2 


A46019 


gene Notch-1 protein 


4.94e-24 


14 


198 


24.5 


2531 


2 


S18188 


notch protein homolog 


4.94e-24 


15 


197 


24.4 


387 


2 


B49175 


Motch A protein - mou 


8.20e-24 


16 


196 


24.3 


1203 


2 


A49175 


Motch B protein - mou 


1.36e-23 


17 


195 


24,2 


1429 


2 


S06434 


homeotic protein lin- 


2.25e-23 


18 


195 


24,2 


2471 


2 


A49128 


cell-fate determining 


2.25e-23 


19 


193 


23,9 


570 


2 


A48836 


fibropellin C precurs 


6.19e-23 


20 


192 


23.8 


2437 


2 


S42612 


transmembrane protein 


1.02e-22 


21 


190 


23,5 


1404 


2 


A36666 


serrate protein precu 


2.80e-22 


22 


190 


23,5 


1408 


2 


S16148 


gene serrate protein 


2.80e-22 


23 


188 


23.3 


722 


2 


148324 


DELTA- like 1 - mouse 


7.64e-22 



24 


188 


23.3 


1220 


A56136 


jagged protein precur 


7.64e 


22 


25 


184 


22.8 


385 


A54785 


preadipocyte factor 1 


5.65e 


21 


26 


184 


22.8 


385 


S53718 


homeotic protein dlk 


5,65e 


21 


27 


178 


22.1 


728 


150719 


C-Delta-1 - chicken 


1.12e 


19 


28 


175 


21.7 


4391 


A38096 


perlecan precursor - 


4.91e 


19 


29 


171 


21.2 


492 


EXBO 


coagulation factor Xa 


3.51e 


18 


30 


171 


21.2 


832 


A31246 


neurogenic protein De 


3.51e 


18 


31 


171 


21.2 


833 


S19087 


gene Delta protein pr 


3.51e 


18 


32 


171 


21.2 


880 


S00670 


gene Delta protein pr 


3.51e 


18 


33 


171 


21.2 


4543 


A53102 


alpha-2-macroglobulin 


3.51e 


18 


34 


165 


20.4 


475 


EXCH 


coagulation factor Xa 


6.57e 


17 


35 


164 


20.3 


200 


A26637 


neurogenic repetitive 


1.07e 


16 


36 


164 


20.3 


259 


S48713 


fetal antigen 1 - hum 


1.07e 


16 


37 


164 


20.3 


260 


A44549 


fetal antigen 1 hotneo 


1.07e 


16 


38 


164 


20.3 


383 


B45484 


delta-like dlk homeot 


1.07e 


16 


39 


164 


20.3 


383 


S53716 


homeotic protein dlk 


1.07e 


16 


40 


163 


20.2 


3051 


S42373 


hypothetical protein 


1.73e 


16 


41 


162 


20.1 


1295 


A32901 


glpl protein precurso 


2.81e 


16 


42 


157 


19.5 


482 


EXRT 


coagulation factor Xa 


3.13e 


15 


43 


154 


19.1 


443 


146932 


coagulation factor VI 


1.31e 


14 


44 


152 


18.8 


427 


S74211 


PAS -6/7 protein precu 


3.40e 


14 


45 


152 


18.8 


488 


EXHO 


coagulation factor Xa 


3.40e 


14 



RESULT 
ENTRY 
TITLE 



1 



A31640 fttype fragment 

epidermal growth "factor -like protein slit ■ fruit fly 
(Drosophila melanogaster) (fragment) 
ORGANISM tformaljiame Drosophila melanogaster 

DATE 28-Feb-1990 isequence_revision 28-Feb-1990 ttext_change 

14-Aug-1998 
ACCESSIONS A31640 
REFERENCE A31640 

iauthors Rothberg, J.M.; Hartley, D.A.; Walther, Z.; 

Artavanis-Tsakonas, S. 
ijournal Cell (1988) 55:1047-1059 

ititle slit: An EGF- homologous locus of D. melanogaster involved in 

the development of the embryonic central nervous system, 
fcross-references MUID: 89077533 
faccession A31640 
Wmolecule.type DNA 
tfresidues 1-530 ft label ROT 
itcross-references GB:M23543; NID:g340939; PID:g514357 
GENETICS 

tgene FlyBase:sli 

* fcross-references FlyBase:FBgn0003425' 
tintrons 470/3 
CLASSIFICATION tsuperfamily EGF homology 
growth factor 



KEYWORDS 
FEATURE 

148-181 
SUMMARY 



♦domain EGF homology flabel EGF 
♦length 530 tchecksum 6330 



Query Match 42.5%; 
Best Local Similarity 42.2%; 
Matches 46; Conservative 



Score 343; DB 2; Length 530; 

Pred. No. 1.29e-57; 

26; Mismatches 31; Indels 



5; 



Db 



184 NIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMYPQTSPCQNHECKHG 243 
I III hl::h III :| I I lh ::l Ml I :|: Mill: II :| 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCE-HPP-PMVLLQTSPCDQYECQNG 58 

Db 244 V-CFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTR 291 

I: I = III ll"l :ll I :::ll "hill : : I 
Qy 59 AQCIWQ-QEPT--CRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 104 



ENTRY B36665 fttype complete 

TITLE slit protein 2 precursor - fruit fly 

melanogaster) 

ORGANISM tformaljiame Drosophila melanogaster 



Tue Jun 



1 10:16:04 1999 



OS-09-191-6.47-13.ipr 



Page 2 



DATE 30-Apr-199Hsequence_revision 30-Apr-1991 ftext change 

16-Dec-1998 
ACCESSIONS B36665 
REFERENCE A36665 

♦authors Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
journal Genes Dev. (1990) 4:2169-2187 

♦title slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
icross -references MUID: 91099665 
iaccession B36665 
ti status preliminary 
itmolecule.type mRNA 
##residues 1-1469 It label ROT 
ttcross-references GB:X53959 
GENETICS 

fgene FlyBase;sli 

licross-references FlyBase:FBgn0003425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 

t homology; leucine-rich alpha-2 -glycoprotein repeat 
homology; proteoglycan carboxyl- terminal homology 



66-91 


tdomain proteoglycan amino-terminal homology ilabel 






PAH1\ 


101 


124 


fdomain leucine-rich alpha-2-glycoprotein repeat 
homology t label LRRl\ 


125 


148 


tdomain leucine-rich alpha - 2 - g ly coprote in repeat 
homology Ilabel LRR2\ 


149 


172 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LRR3\ 


173 


196 


fdomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LRR4\ 


197 


220 


Idomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LRR5\ 


228 


272 


fdomain proteoglycan carboxyl-terminal homology ilabel 
PCS1\ 






idomain proteoglycan amino-terminal homology ilabel 
PAH2\ 


323 


346 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LRR6\ 


347 


370 


idomain leucine-rich alpha - 2 - g lycoprotein repeat 
homology Ilabel LRR7\ 


371 


394 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology Ilabel LRR8\ 


395 


418 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LRR9\ 


419 


442 


fdomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LR10\ 


450 


494 


fdomain proteoglycan carboxyl-terminal homology Ilabel 
PCS2\ 


512 


537 


idomain proteoglycan amino-terminal homology ilabel 
PAH3\ 


547 


571 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LRll\ 


572 


595 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LR12\ 


596 


619 


idomain leucine-rich alpha- 2 -glycoprotein repeat 
homology Ilabel LR13\ 


620 


643 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology ilabel LR14\ 


651 


695 


idomain proteoglycan carboxyl-terminal homology ilabel 
PCS3\ 


708 


733 


idomain proteoglycan amino-terminal homology ilabel 
PAH4\ 


743 


766 


idomain leucine-rich alpha - 2 - g ly coprote in repeat 
homology ilabel LR15\ 


767 
846 


790 
890 


idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LR16\ 
fdomain proteoglycan carboxyl-terminal homology flabel 

PCS4\ 


1028-1061 


idomain EGF homology ilabel EGF 



ilength 1469 imolecular-weight 164695 tchecksum 8361 



Query Match' 42,5%; Score 343; DB 2; Length 1459; 

Best Local Similarity 42.2%; Pred. No. 1.29e-57; 

Matches 46; Conservative 26; Mismatches 31; Indels 6; Gaps 5; 

Db 1064 NIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISHMYPQTSPCQNHECKHG 1123 

I II! I:h:|: III :| I I II: :;| :M I :|: lllll: I! :| 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCE ■ HPP - PMVLLQTSPCDQYECQNG 58 

Db 1124 V-CFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTR 1171 

I: I : III ll::| :|| I :::|| ::|:||| : : I 
Qy 59 AQCIWQ-QEPT-CRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 104 



3 



ENTRY 
TITLE 



ORGANISM 
DATE 



ACCESSIONS 



♦journal 
ttitle 



A36665 I type complete 
slit protein 1 precursor - fruit fly (Drosophila 

melanogaster) 
iformal_name Drosophila melanogaster 
30-Apr-1991 isequence_revision 30-Apr-1991 itext change 

24-Sep-1998 
A36665; S13523 
A36665 

♦authors Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 
Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit; an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
♦cross -references MUID : 91099665 
♦accession A36665 

iistatus preliminary 
♦imoleculejype mRNA 
♦♦residues 1-1480 tilabel ROT 
♦♦cross-references GB;X53959; NID:g8614; PID:g8615 
GENETICS 

♦gene FlyBase:sli 

♦icross-references FlyBase:FBgn0003425 
CLASSIFICATION (tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha - 2 -glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 



KEYWORDS 


alternative splicing 


FEATURE 




66-91 


idomain proteoglycan amino-terminal homology ilabel 




PAH1\ 


101-124 


♦domain leucine-rich alpha -2 -glycoprotein repeat 




homology ♦label LRR1\ 


125-148 


fdomain leucine-rich alpha - 2 -glycoprotein repeat 




homology Ilabel LRR2\ 


149-172 


♦domain leucine-rich alpha - 2 -glycoprotein repeat 




homology ♦label LRR3\ 


173-196 


fdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology ilabel LRR4\ 


197-220 


fdomain leucine-rich alpha - 2 - g lycoprotein repeat 




homology ilabel LRR5\ 


228-272 


fdomain proteoglycan carboxyl-terminal homology Ilabel 




PCS1\ 


288-313 . 


tdomain proteoglycan amino-terminal homology flabel 




PAH2\ 


323-346 


♦domain leucine-rich alpha - 2 - glycoprotein repeat 




homology ilabel LRR6\ 


347-370 


tdomain leucine-rich alpha - 2 - glycoprotein repeat 




homology ilabel LRR7\ 


371-394 


♦domain leucine-rich alpha-2-glycoprotein repeat 




homology ilabel LRR8\ 


395-418 


♦domain leucine-rich alpha-2-glycoprotein repeat 




homology ♦label LRR9\ 


419-442 


♦domain leucine-rich alpha-2-glycoprotein repeat 




homology ♦label LR10\ 


450-494 


♦domain proteoglycan carboxyl-terminal homology ftlabel 




PCS2\ 


512-537 


fdomain proteoglycan amino-terminal homology Ilabel 




PAH3\ 



I 
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547-571 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology t label LR11\ 


572-595 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology f label LR12\ 


596-619 


tdomain leucine-rich alpha - 2 -glycoprotein repeat 




homology tlabel LR13\ 


620-643 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LR14\ 


651-695 


Idomain proteoglycan carboxyl-terminal homology tlabel 




PCS3\ 


708-733 


tdomain proteoglycan amino-terminal homology flabel 




PAH4\ 


743-766 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LR15\ 


767-790 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LR16\ 


791-814 


tdomain leucine-rich alpha-2 -glycoprotein repeat 




homology tlabel LR17\ 


815-838 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LR18\ 


846-890 


tdomain proteoglycan carboxyl-terminal homology flabel 




PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



SUMMARY 



flength 1480 tmolecular-weight 165751 ^checksum 900 



Query Match 42.5%; Score 343; DB 2; Length 1480; 

Best Local Similarity 42.2*; Pred. No. 1.29e-57; 

Matches 46; Conservative 26; Mismatches 31; Indels 6; Gaps 5; 

Db 1064 NIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMYPQTSPCQNHECKHG 1123 

I III hl::|: III :| I I II: ::l :|| I :|: llllh II :| 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTC ICPQGFSGLFCE - HPP - PMVLLQTS PCDQ YECQNG 58 

Db 1124 V-CFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTR 1171 

I: I : III ll::| :|| I :::|| ::|:||| : : I 
Qy 59 AQCIWQ-QEPT--CRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 104 



RESULT 4 
ENTRY 
TITLE 
ORGANISM 
DATE 



ACCESSIONS 



f authors 



f journal 
ftitle 



A40043 ftype complete 
notch protein homolog TAN-1 precursor - human 
fformaljiame Homo sapiens fcommonjiame man 
21-Apr-1992 tsequence.revision 21-Apr-1992 ttext change 

14-Aug-1998 
A40043 
A40043 

Ellisen, L.W.; Bird, J.; West, D.C.; Soreng, a.l.; Reynolds 

T.C.; Smith, S.O.; Sklar, J. 
Cell (1991) 66:649-661 

TAN-1, the human homolog of the Drosophila Notch gene, is 
broken by chromosomal translocations in T lymphoblastic 
neoplasms . 
tcross-references MUID: 91347367 
taccession A40043 

ttstatus preliminary; nucleic acid sequence not shown; not 

compared with conceptual translation 
ttmolecule_type mRNA 
tfresidues 1-2555 * tlabel ELL 
t tcross-references GB:M73980 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

1149-1180 tdomain EGF homology tlabel EGF\ 

1927-1959 tdomain ankyrin repeat homology tlabel AN1\ 

1960-1992 tdomain ankyrin repeat homology tlabel AN2\ 

1994-2026 tdomain ankyrin repeat homology tlabel AN3\ 

2027-2059 tdomain ankyrin repeat homology tlabel AN4\ 

2060-2092 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2555 tmolecular-weight 272337 fchecksum 463 

Query Match 26.6*; Score 215; DB 2; Length 2555; 

Best Local Similarity 41.01; Pred. No. 8.30e-28; 

Matches 34; Conservative 18; Mismatches 24; Indels 7; Gaps 4 



947 NECASDPCRNGANCTDCVDSYTCTCPAGFSGIHCENNTPDCT -ESS-C--F — NGGTCV 9 9 9 
' -I : 1 1 : 1 1 I I |::||| II illh ||: I ::| I : ||: |: 
3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

1000 DGINSFTCLCPPGFTGSYCQHW 1022 
II llllhl: I: :: 
63 WQQEPTCRCPPGFAGPRCEKLI 85 



tauthors 



fjournal 
•title- 



RESULT 5 

ENTRY A40136 ttype complete 

TITLE fibropellin la - sea urchin (Strongylocentrotus purpuratus) 

ALTERNATEJAMES epidermal growth factor homolog precursor 
CONTAINS alternatively spliced fibropellin lb (EGFI) 

ORGANISM tformaljiame Strongylocentrotus purpuratus tcommon_name 

purple urchin 

DATE 13-May-1992 tsequence revision 17-Sep-1997 ttext change 

07-Aug-1998 

ACCESSIONS A40136; B40136; C40136; A29316; A43131 
A40136 

Delgadillo-Reynoso, M.G.; Rollo, D.R.; Hursh, D.A.; Raff, 
R.A. 

J. Mol, Evol. (1989) 29:314-327 
Structural analysis of the uEGF gene in the sea urchin 
Strongylocentrotus purpuratus reveals more similarity to 
vertebrate than to invertebrate genes with EGF-like 
repeats. 

tcross-references MUID:90112459 
taccession A40136 

ttstatus preliminary 

ft#molecule_type mRNA 

tfresidues 1-114 ftlabel DEL 

f tcross-references GB:X17530; NID:gl0225; PID:g667061 
♦accession B40136 

ttstatus preliminary; not compared with conceptual translation 

ttmolecule_type DNA 

tfresidues 181-251, 329-370, 'R', 372-408, 'RA', 411-441 ftlabel DE2 
taccession C40136 

ttstatus preliminary; not compared with conceptual translation 

tfmolecule_type DNA 
■ tfresidues T, 747-821, 898-978 ftlabel DE3 

REFERENCE A29316 

tauthors Hursh, D.A.; Andrews, M.E.; Raff, R.A. 

fjournal Science (1987) 237:1487-1490 

ftitle A sea urchin gene encodes a polypeptide homologous to 

epidermal growth factor, 
tcross-references MUID: 87319677 
taccession A29316 

ttstatus preliminary 
tfmolecule.type mRNA 

tfresidues 'S' ,280-481, 786-1064 ftlabel HUR 
tf cross -references GB:M17421; NID:gl61474; PID:g552260 
A43131 

tauthors Hunt, L.T.; Barker, W.C. 
fjournal FASEB J. (1989) 3:1760-1764 

ftitle Avidin-like domain in an epidermal growth factor homolog from 

a sea urchin, 
tcross-references MUID : 89196806 
fcontents annotation 
COMMENT EGF homology repeats 10-17 are spliced out in the short form 
(fibropellin lb). 

CLASSIFICATION tsuperfamily Clr/Cls repeat homology; EGF homology 
FEATURE 

1-19 tdomain signal sequence tstatus predicted flabel SIG\ 

20-1064 tproduct fibropellin I tstatus predicted tlabel FIB\ 

23-54 tdomain EGF homology flabel EG01\ 

57-175 tdomain Clr/Cls repeat homology tlabel CSR\ 

180-211 tdomain EGF homology flabel EG02\ 

218-249 tdomain EGF homology flabel EG03\ 

256-287 tdomain EGF homology flabel EG04\ 

294-325 tdomain EGF homology flabel EG05\ 

332-363 tdomain EGF homology tlabel EG06\ 
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370-401 
408-439 
446-477 
484-515 
522-553 
560-591 
598-629 
636-667 
674-705 
712-743 
750-781 
788-819 
826-857 
864-895 
902-933 
936-1064 

23-34,28-43,45-54, 
62-88,180-191, 
185-200,202-211 
218-229,223-238 
240-249,256-267 
261-276,278-287 
294-305,299-314 
316-325,332-343 
337-352,354-363 
370-381,375-390, 
392-401,408-419 
413-428,430-439 
446-457,451-466 
468-477,484-495 
489-504,506-515 
522-533,527-542 
544-553,560-571 
565-580,582-591 
598-609,603-618 
620-629,636-647 
641-656,658-667 
674-685,679-694 
696-705,712-723 
717-732,734-743 
750-761,755-770, 
772-781,788-799, 
793-808,810-819, 
826-837,831-846, 
848-857,864-875, 
869-884,886-895, 
902-913,907-922, 
924-933 
SUMMARY 



tdomain EGF homology tlabel EG07\ 

fdomain EGF homology tlabel EG08\ 

fdomain EGF homology tlabel EG09\ 

Sdomain EGF homology flabel EG10\ 

fdomain EGF homology tlabel EG11\ 

tdomain EGF homology tlabel EG12\ 

Sdomain EGF homology tlabel EG13\ 

tdomain EGF homology tlabel EG14\ 

tdomain EGF homology tlabel EG15\ 

tdomain EGF homology tlabel EG16\ 

tdomain EGF homology tlabel EG17\ 

tdomain EGF homology tlabel EG18\ 

tdomain EGF homology tlabel EG19\ 

tdomain EGF homology tlabel EG20\ 

tdomain EGF homology tlabel EG21\ 

tregion avidin-like\ 



tdisulfide_bonds tstatus predicted\ 



tdisulfide.bonds tstatus predicted 
flength 1064 tmolecular-weight 112072 tchecksum 303 



Query Match 26,5*; Score 214; DB 2; Length 1064; 

Best Local Similarity 42.7*; Pred, No, 1.39e-27; 
Matches 35; Conservative 13; Mismatches 27; Indels 7; Gaps 2; 

Db 822 NIDECASDPCLNGGICVDGVNGFVCQCPPNYSGTYCE ISLDA--CRSMPCQNGAT 874 

I 1:1 : I :h III III: I II :|| :|| : |:: I Mill 
Qy 1 NNDDCVGHKCRHGAQCVDEM3YTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 875 CVNVGADYVCECVPGYAGQNCE 896 

I: I : I I 11:11 II 
Qy 61 CIWQQEPTCRCPPGFAGPRCE 82 



RESULT 6 

ENTRY A24420 ttype complete 

TITLE notch protein • fruit fly (Drosophila melanogaster) 

ALTERNATEJAMES neurogenic repetitive locus protein 

ORGANISM tformal_name Drosophila melanogaster 

DATE 30-Jun-1987 tsequence_revision 30-Jun-1987 I text change 
07-Aug-1998 

ACCESSIONS A24420; A24768; S09358; A05267 

REFERENCE A24420 

tauthors Kidd, S.; Kelley, M.R.; Young, M,W. 



tjournal Mol. Cell. Biol. (1986) 6:3094-3108 
f cross-references MOID: 87064624 
taccession A24420 
ttmolecule.type DNA 
ttresidues 1-2703 ttlabel KID 
ttcross -references GB:K03508; KID:gl57991; PID:gl57993 
A24768 

tauthors Wharton, K.A.; Johansen, K.M.; Xu, T.; Artavanis-Tsakonas, S, 
fjournal Cell (1985) 43:567-581 
tcross-references MUID: 86079539 
taccession A24768 
ttmolecule.type mRNA 

ttresidues 1-48, 'I', 50-118, 'R' ,120-230, 'V , 232-256, T ,258-266, 'A', 
268-872, 'R', 874-958, 'R', 960-1970, 'FH', 1973-2256, 'G', 
2258-2264, 'V ,2266-2406/ R\ 2408-2444, 'L\ 2446-2703 
ttlabel WHAl 

ttnote the authors translated the codon ATC for residue 49 as 

Thr, ATT for residue 2044 as Arg, GTA for residue 2265 
as Ala, CGC for residue 2407 as His, and CTT for 
residue 2445 as Arg 

REFERENCE ' S09358 
tauthors Tautz, D, 

tjournal Nucleic Acids Res, (1989) 17:6463-6471 

ttitle Hypervariability of simple sequences as a general source for 

polymorphic DNA markers, 
tcross-references MOID: 89385974 
taccession S09358 
ttmolecule.type DNA 

ttresidues 2505-2551, 'QQQQ', 2552-2576, T, 2578-2604 ttlabel TAU 
REFERENCE A05267 

tauthors Wharton, K.A.; Yedvobnick, B, ; Finnerty, V.G.; 

Artavanis-Tsakonas, S. 
fjournal Cell (1985) 40:55-62 

ttitle opa: a novel family of transcribed repeats shared by the 

Notch locus and other developmental^ regulated loci in D.. 
melanogaster. 

tcross-references MUID: 85099329 

taccession A05267 
ttmolecule.type DNA 

ttresidues 2504-2576, 'E' , 2578-2611 tilabel WHA2 
GENETICS 

tgene notch; opa 

ttcross -references FlyBase : FBgn000464 7 
tmap_position 8.96-9.36 

tintrons 53/3; 84/3; 171/3; 240/3; 283/3; 2333/3; 2436/3; 2588/3 
CLASSIFICATION tsuperfamily notch protein; ankyrin repeat homology; EGF 
homology 

KEYWORDS differentiation; tandem repeat; transmembrane protein 

FEATURE 

27-43 fdomain transmembrane tstatus predicted tlabel TMM1\ 

568-599 fdomain EGF homology flabel EGF\ 

1746-1762 tdomain transmembrane tstatus predicted tlabel TMM2\ 

1950-1982 fdomain ankyrin repeat homology tlabel AN1\ 

1983-2015 tdomain ankyrin repeat homology tlabel AN2\ 

1988-2004 tdomain transmembrane tstatus predicted tlabel TMM3\ 

2017-2049 tdomain ankyrin repeat homology tlabel AN3\ 

2050-2082 tdomain ankyrin repeat homology tlabel AN4\ 

2083-2115 tdomain ankyrin repeat homology tlabel AN5\ 

2538-2568 tregion glutamine-rich\ 

2538-2568 tdomain neurogenic repetitive element tstatus predicted 

tlabel OPA 

SUMMARY Slength 2703 tmolecular-weight 288876 Schecksum 6404 

Query Match 25.9%; Score 209; DB 2; Length 2703; 

Best Local Similarity 40.7%; Pred. No. 1.81e-26; 

Matches 33; Conservative 14; Mismatches 30; Indels 4; Gaps 2; 

Db 293 NYDDCLGHLCQNGGTCIDGISDYTCRCPPNFTGRFCQDD--VD-ECAQRDHPVCQNGAT 348 

I llhll h:|: hi :: III II hi lh I : : h Mill 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 349 OTHGSYSCICVNGWAGLDC 369 

I : :l I I II I 
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Qy 61 CIWQQEPTCRCPPGFAGPRC 81 



RESULT 7 

entry B26637 ttype fragment 

TITLE neurogenic repetitive locus 95F protein - fruit fly 

(Drosophila melanogaster) (fragment) 
ORGANISM tformaljiame Drosophila melanogaster 

DATE 16-Aug-1988 tsequencejrevision 16-Aug-1988 ttext.change 

14-Aug*1998 
ACCESSIONS B26637 
REFERENCE A91081 

fauthors Knust, E.; Dietrich, U,; Tepass, U.; Bremer, K.A.; Weigel, 

D.; Vaessin, H.; Campos -Ortega, J. A, 
tjournal EMBO J. (1987) 6:761-766 

ftitle EGF homologous sequences encoded in the genome of Drosophila 

> melanogaster, and their relation to neurogenic genes, 
f cross -references MUID:87218537 
faccession B26637 
ttmolecule type mRNA 
ttresidues 1-293 * tlabel KNU 
ttcross-references GB:X05144; NID:g7519; PID:g929536 
GENETICS 

tgene FlyBase:crb 

ttcross-references FlyBase: FBgn0000368 

CLASSIFICATION tsuperfamily EGF homology 

KEYWORDS transmembrane protein 
FEATURE 

216-252 ttdomain EGF homology ((label EGF 

SUMMARY tlength 293 f checksum 3413 

Query Match 25.44; Score 205; DB 2; Length 293; 

Best Local Similarity 34.9%; Pred. No. 1.40e-25; 

Matches 29; Conservative 14; Mismatches 37; Indels 3; Gaps 3; 

Db 173 NIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTC-ENEPCRNGSTCQNGF 231 

I hi : I::h I I : : I I 1 1 : : I I : I : II : III 
Qy 1 NNDDCVGHRCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPC-DQYECQNGA 59 



Db 232 N-ASTGNNFTCTCVPGFEGPLCD 253 

: II I III II I: 
Qy 60 QCIWQQEPTCRCPPGFAGPRCE 82 



RESULT 8 

•try A35672 ttype complete 

ffliE crumbs protein - fruit fly (Drosophila melanogaster) 

GANISM tformaljiame Drosophila melanogaster 

DATE 21-Sep-1990 isequence_revision 18-Nov-1992 ftext change 

14-Aug-1998 
ACCESSIONS A35672 
REFERENCE A35672 

tauthors Tepass, U.; Theres, C; Knust, E. 
fjournal Cell (1990) 61:787-799 

ftitle crumbs encodes an EGF-like protein expressed on apical 

membranes of Drosophila epithelial cells and required for 
organization of epithelia. 
Icross -references MUID:90263104 
((accession A35672 

iistatus preliminary 
fimoleculejype mRNA 
ftresidues 1-2139 tflabel TEP 
ilcross-references GB;M33753 

Sttnote the authors translated the codon GGC for residue 1928 as 

Cys, and TAT for residue 2023 as Gin 

GENETICS 

fgene FlyBase ;crb 

t#cross-references FlyBase:FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 
KEYWORDS transmembrane protein 

FEATURE 

691-722 ((domain EGF homology ((label EGF 

SUMMARY tlength 2139 tmolecular -weight 233619 tchecksum 7230 



Query Match 25 .4%; Score 205; DB 2; Length 2139; 

Best Local Similarity 34. 91; Pred. No. 1.40e-25; 

Matches 29; Conservative 14; Mismatches 37; Indels 3; Gaps 3; 

Db 1835 NIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTC - ENEPCRNGSTCQNGF 1893 

I 1:1 : I::h I I : : I I 1 1 : : I I : I : || : MM 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPC - DQYECQNGA 59 

Db 1894 N-ASTGNNFTCTCVPGFEGPLCD 1915 

: II I III II h 
Qy 60 QCIWQQEPTCRCPPGFAGPRCE 82 



RESULT 9 

ENTRY S78549 ttype complete 

TITLE notch3 protein - human 

ORGANISM tformaljiame Homo sapiens tcommonjiame man 

DATE 24-M-1998 tsequence_revision 24-Jul-1998 ftext change 

17-Mar-1999 
ACCESSIONS S78549; S71825 
REFERENCE S78549 

((authors Joutel, A,; Tournier-Lasserve, E. 
f submission submitted to the EMBL Data Library, April 1997 
((accession S78549 
ttmoleculejype mRNA 
ttresidues 1-2321 ttlabel JOUl 
ttcross-references EMBL:U97669; NID:g2668591; PID;g2668592 
REFERENCE S71825 

tauthors Joutel, A.; Corpechot, C; Ducros, A.; Vahedi, K. ; Chabriat, 
H.; Mouton, P.; Alamowitch, S.; Domenga, v.; Cecillion, M. ; 
Marechal, E,; Maciazek, J.; Vayssiere, C; Cruaud, C; 
Cabanis, E.A.; Ruchoux, M.M.; Weissenbach, J.; Bach, J.F.; 
Bousser, M.G.; Tournier-Lasserve, E. 
tjournal Nature (1996) 383:707-710 
Ititle Notch3 mutations in CADASIL, a hereditary adult-onset 

condition causing stroke and dementia, 
tcross -references MUID: 97032728 
taccession S71825 

ttstatus nucleic acid sequence not shown 
Itmolecule.type dna 

ttresidues 67-113; 138 -194 ; 268-333 , r G' y 335 -346 ; 536 -613 ; 716-765; 

1240-1279; 1815-1888 ttlabel JOU2 
(ttcross-references EMBL:U97669 
GENETICS 

tgene notch.3 
tmapjositlon 19pl3.1 

FUNCTION 

tdescription may be involved in pathogenesis of CADASIL, causing a type of 

stroke and dementia 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 

repeat homology; EGF homology 

KEYWORDS tandem repeat; transmembrane protein 
FEATURE 

318-349 tdomain EGF homology tlabel EGF\ 

1838-1870 tdomain ankyrin repeat homology tlabel ANl\ 

1871-1903 tdomain ankyrin repeat homology tlabel AN2\ 

1905-1937 tdomain ankyrin repeat homology tlabel AN3\ 

1938-1970 , tdomain ankyrin repeat homology tlabel AN4\ 
1971-2003 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2321 tmolecular -weight 243657 tchecksum 3337 

Query Match 25,4*; Score 205; DB 2; Length 2321; 

Best Local Similarity 38.6%; Pred. No. 1.40e-25; 

Matches 32; Conservative 20; Mismatches 23; Indels 8; Gaps 5; 

Db 509 DECASTPCRNGAKCVDQPDGYECRCAEGFEGTLCDRN---VD-DCSP-DP-CHHG-RCV 560 

Ml : IMII MM Ml I I : : 1 1 I :|:: I : II I MM M : 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 



561 DGIASFSCACAPGYTGTRCESQV 583 
M MIMM III : 
Qy 63 WQQEPTCRCPPGFAGPRCEKLI 85 
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RESULT 10 

ENTRY A35844 ttype complete 

TITLE Xotch protein - African clawed frog 

ORGANISM tformaljiame Xenopus laevis tcoifonjiame African clawed frog 

DATE 12-Oct-1990 isequence.revision 12-Oct-1990 ftext.change 

14-Aug-1998 
ACCESSIONS A35844 
REFERENCE A3 5 84 4 

♦authors Coffman, c.; Harris, w.; Kintner, C, 

♦journal Science (1990) 249:1438-1441 

♦title Xotch, the Xenopus homolog of Drosophila notch, 

♦cross-references MUID: 90385285 

♦accession A35844 

♦♦status preliminary; nucleic acid sequence not shown; not 

compared with conceptual translation 
ttmoleculejype mRNA 
♦♦residues 1-2524 tflabel COF 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 

•repeat homology; EGF homology 
WORDS transmembrane protein 

TORE 

222-254 ♦domain EGF homology tlabel EGF\ 

1924-1956 ♦domain ankyrin repeat homology tlabel AN1\ 

1957-1989 ♦domain ankyrin repeat homology tlabel AN2\ 

1991-2023 ♦domain ankyrin repeat homology tlabel AN3\ 

2024-2056 tdomain ankyrin repeat homology tlabel AN4\ 

2057-2089 »domain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2524 ♦molecular-weight 274931 tchecksum 9441 

Query Match 25.44; Score 205; DB 2; Length 2524; 

Best Local Similarity 42.5%; Pred. No. 1.40e-25; 

Matches 34; Conservative 16; Mismatches 23; Indels 7; Gaps 4; 

Db 947 NECASNPCKNGANCTDCVNSYTCTCQPGFSGIHCESNTPDCT-ESS-C--F---NGGTCI 999 

::! :: l::ll I I Ihlll ! Ill: II I ::| | : ||: 
Oy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLOTSPCDQYECQNGAQCI 62 



Db 



1000 DGINTFTCQCPPGFTGSYCQ 1019 
Ihllllhl: |: 
63 WQQEPTCRCPPGFAGPRCE 82 



RESULT 11 
ENTRY 
TITLE 
ORGANISM 
4&ATE 



S45306 ttype complete 
notch 3 protein - mouse 

tformaljiame Mus musculus tcommonjiame house mouse 
20-Feb-1995 ♦sequence.revision 20-Feb-1995 ttext change 

PlO-Jul-1998 
CESSIONS S45306 
REFERENCE S45306 

♦authors Lardelli, M. ; Dahlstrand, J.; Lendahl, u, 
♦journal Mech. Dev. (1994) 46:123-136 
♦title The novel Notch homologue mouse Notch 3 lacks specific 
epidermal growth factor-repeats and is expressed in 
proliferating neuroepithelium. 
♦cross-references MUID: 95001556 
♦accession S45306 

♦♦status preliminary 
ttmoleculejype mRNA 
ttresidues 1-2318 ttlabel LAR 
♦♦cross -references EMBL:X74760; NID:g483580; PID:g483581 . 
CLASSIFICATION ♦superfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 



♦domain ankyrin repeat homology tlabel AN1\ 
♦domain ankyrin repeat homology tlabel AN2\ 
♦domain ankyrin repeat homology tlabel AN3\ 
♦domain ankyrin repeat homology tlabel AN4\ 
♦domain ankyrin repeat homology tlabel AN5 
♦length 2318 tmolecular-weight 244245 ♦checksum 9358 



FEATURE 




1839 


1871 


1872 


1904 


1906 


1938 


1939 


1971 


1972 


2004 


SUMMARY 






> 



Query Match 25.0%; Score 202; DB 2; Length 2318; 

Best Local Similarity 39.8%; Pred. No. 6,47e-25; 

Matches 33; Conservative 19; Mismatches 23; Indels 8; Gaps 5; 

Db 510 DEC AST PCRNGAKCVDQPDG YECRCAEGFEGTLCERN - - • VD - DCSP - DP - - CHHG • RC V 561 

1:1 : Ihll III: :|l I |::ll I :||: I : II I |::| :|: 
Dy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 562 DGIASFSCACAPGYTGIRCESQV 584 

:| hll::| III : 
Dy 63 WQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 12 

ENTRY A48825 ttype fragment 

TITLE Notch homolog Motch protein - mouse (fragment) 

ORGANISM tformaljiame Mus musculus tcommonjiame house mouse 

DATE 01-Dec-1993 tsequence revision 18-Nov-1994 ttext change 

14-Aug-1998 
ACCESSIONS A48825 
REFERENCE A48825 

♦authors 1 Reaume, A.G.; Conlon, R.A.; Zirngibl, R. ; Yamaguchi, T.P.; 

Rossant, J, 

♦journal Dev. Biol. (1992) 154:377-387 

♦title Expression analysis of a Notch homologue in the mouse embryo, 
♦cross-references MUID:93050801 
♦accession A48825 

♦♦status preliminary; not compared with conceptual translation 

ttmolecule.type mRNA 

♦♦residues 1-861 ♦tlabel REA 

♦texperimental.source embryo 

♦♦note sequence extracted from NCBI backbone (NCBIP : 119144 ) 

CLASSIFICATION tsuper family unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

26-57 ♦domain EGF homology tlabel EGF 

SUMMARY tlength 861 ♦checksum 7963 

Query Match 24.5%; Score 198; DB 2; Length 861; 

Best Local Similarity 40.0%; Pred. No, 4,94e-24; 

Matches 32; Conservative 19; Mismatches 22; Indels 7; Gaps 4; 

Db 158 NECASNPCQNGANCTDCVDSYTCTCPVGFNGIHCENNTPDCT-ESS-C-F--NGGTCV 210 

::| :: l-ll I I |::|ll II Ihl: II: I ::| I : II: I: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 211 DG I NSFTCLCPPGFTGSYCQ 230 

II HUM: |: 
Qy 63 WQQEPTCRCPPGFAGPRCE 82 



RESULT 13 

ENTRY A46019 ttype complete 

TITLE gene Notch-1 protein - mouse 

ORGANISM tformaljiame Mus musculus tcommonjiame house mouse 

DATE 22-Sep-1993 tsequence revision 18-Nov-1994 ttext change 

14-Aug-1998 
ACCESSIONS A46019 
REFERENCE A46019 

♦authors del Amo, F.F.; Gendron-Maguire, M.; Swiatek, P.J.; Jenkins, 
N.A.; Copeland, N.G.; Gridley, T. 

t journal Genomics (1993) 15:259-264 

ttitle Cloning, analysis, and chromosomal localization of Notch-1, 

mouse homolog of Drosophila Notch, 
♦cross-references MUID: 93194170 
♦accession A46019 

♦♦status preliminary; not compared with conceptual translation 

ttmoleculejype nucleic acid 

ttresidues 1-2531 ttlabel DEL 

tf cross -references GB:Z11886; GB:S47228; NID:g288502; PID:g288503 
ttnote sequence extracted from NCBI backbone {NCBIP : 127318 ) 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 
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FEATURE 

757-788 f domain EGF homology tlabel EGF\ 

1917-1948 tdomain ankyrin repeat homology tlabel ANl\ 

1949-1981 tdomain ankyrin repeat homology tlabel AN2\ 

1983-2015 tdomain ankyrin repeat homology tlabel AN3\ 

2016-2048 tdomain ankyrin repeat homology tlabel AN4\ 

2049-2081 tdomain ankyrin repeat homology tlabel AN5 
SUMMARY tlength 2531 tmolecular -weight 271312 tchecksum 6611 

Query Match 24.5%; Score 198; DB 2; Length 2531; 

Best Local Similarity 40.04; Pred. No, 4.94e-24; 

Matches 32; Conservative 19; Mismatches 22; Indels 7; Gaps 4 

Db 947 NECASNPCQNGANCTDCVDS YTCTCPVGFNG IHCENNTPDCT -ESS-C--F — NGGTCV 999 

"I :: |::|| | | ::l|| |j |;|: |: | ::| | : || : : 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

1000 DGINSFTCLCPPGFTGSYCQ 1019 
II illlhh I: 
)y 63 WQQEPTCRCPPGFAGPRCE 82 



RESULT 
ENTRY 
TITLE 
ORGANISM 
DATE 



14 



S18188 ftype complete 
notch protein homolog ■ rat 
tformaljiame Rattus norvegicus tcommonjame Norway rat 
19-Feb-1994 tsequence_revision 10-Nov-1995 ttext change 
12-Feb-1999 
ACCESSIONS S18188 
REFERENCE S18188 

tauthors Weinmaster, G.; Roberts, V.J.; Lemke, G. 

t journal Development (1991) 113:199-205 

ttitle A homolog of Drosophila Notch expressed during mammalian 

development, 
tcross -references MUID: 92111383 
taccession S18188 
itmolecule.type mRNA 
ftresidues 1-2531 ttlabel WEI 
ttcross-references EMBL:X57405; NID:g57634; PID;g57635 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 



FEATURE 
1917-1949 
1950-1982 
1984-2016 
k 2017-2049 
| 2050-2082 
\RY 



tdomain ankyrin repeat homology tlabel AN1\ 

tdomain ankyrin repeat homology tlabel AN2\ 

tdomain ankyrin repeat homology tlabel AN3\ 

tdomain ankyrin repeat homology tlabel AN4\ 

tdomain ankyrin repeat homology tlabel AN5 

tlength 2531 tmolecular -weight 270907 tchecksum 2705 



Query Match 24.5%; 
Best Local Similarity 40.0%; 
32; Conservative 



Score 198; DB 2; Length 2531; 

Pred. No. 4.94e-24; 

18; Mismatches 23; Indels 7; 



4; 



Db 



947 NECATNPCQNGANCTDCVDSYTCTCPTGFNGIHCENNTPDCT-ESS-C--F- -NGGTCV 999 
::| : |::|| I I |::||l II ||:|: II: I ::| I : ||: |: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 1000 DGINSFTCLCPPGFTGSYCQ 1019 

II lllll:|: I: 
Qy 63 WQQEPTCRCPPGFAGPRCE 82 



ttitle Motch A and Motch B--two mouse Notch homologues coexpressed 

in a wide variety of tissues, 
tcross -references MUID: 93178563 
taccession B49175 

ttstatus preliminary; nucleic acid sequence not shown 

ttmolecule_type mRNA 

ttresidues 1-387 ttlabel lar 

ttcross-references EMBL:X68278; NID:g287987; PID:g287988 

ttexperimental_source embryo 

ttnote sequence extracted from NCBI backbone (NCBIP: 126159) 

COMMENT This protein has many EGF repeats and lin-12/Notch repeats. 
COMMENT This protein is one of the neurogenic proteins controlling the 

decision between ectodermal and neural fate for cells in the 

early embryo. 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 



tdomain EGF homology tlabel EGF 
tlength 387 tchecksum 5404 



27-58 
SUMMARY 



Query Match 24.4%; Score 197; DB 2; Length 387; 

Best Local Similarity 35.8%; Pred. No. 8.20e-24; 

Matches 29; Conservative 15; Mismatches 36; Indels 1; Gaps 

Db 25 NECLSQPCQNGGTCIDLTNSYKCSCPRGTQGVHCEINVDDCHPPLDPASRSPKCFNNGTC 84 

::|::: h = h hi hi I Ihl I = II I : II : I 

2y 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQY-ECQNGAQC 61 

Db 85 VDQVGGYTCTCPPGFVGERCE 105 

: II Mill I III 
3y 62 IWQQEPTCRCPPGFAGPRCE 82 



Search completed: Fri May 28 09:35:38 1999 
Job time : 22 sees. 



RESULT 
ENTRY 
TITLE 



ALTERNATEJAMES Notch homolog 



B49175 ttype fragment 
Motch A protein - mouse (fragment) 



ORGANISM 
DATE 



ACCESSIONS 



tauthors 
tjournal 



tformaljiame Mus musculus tcommonjiame house mouse 
21-Jan-1994 tsequence_revision 05-Jan-1996 ttext change 

14-Aug-1998 
B49175; PH1569; S32109 
A49175 

Lardelli, M.; Lendahl, U, 

Exp. Cell Res. (1993) 204:364-372 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

AlPsrch_pp protein - protein database search, using Smith -Waterman algorithm 

^Run on: Fri May 28 09:35:57 1999; MasPar time 5.04 Seconds 

583.869 Million cell updates/sec 

Tabular output not generated. 

Title: >US-09-191-647-13 

Description: (1-104) from US09191647 .pep 
Perfect Score; 807 

1 NNDDCVGHKCRHGAQCVDEV ITVNFVGKDSYVELASAKVR 104 



Scoring table: PAM 150 
Gap 11 

Searched: 77977 seqs, 28268293 residues 

Post -processing: Minimum Match 0% 

Listing first 45 summaries 

Database: swiss-prot37 
l:swissprot 

Statistics: Mean 37.259; Variance 53.097; scale 0.702 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



NO. 


Score Match Length D 


ID 


Description 


Pred. No. 


1 


343 


42 


5 


1480 


SLIT.DROME 


SLIT PROTEIN PRECURSOR 


7.90e 


66 


2 


215 


26 


6 


2444 


NTClJUMAN 


NEUROGENIC LOCUS NOTCH 


6.84e 


32 


3 


214 


26 


5 


1064 


FBPl.STRPU 


FIBROPELLIN I PRECURSO 


1.23e 


31 


4 


209 


25 


9 


2703 


N0TC_DROME 


NEUROGENIC LOCUS NOTCH 


2.28e 


30 


5 


205 


25 


4 


2139 


CRB.DROME 


CRUMBS PROTEIN PRECURS 


2.33e 


29 


6 


205 


25 


4 


2524 


NOTCJ5ENLA 


NEUROGENIC LOCUS NOTCH 


2.33e 


29 


7 


202 


25 


0 


2318 


HTC3J10USE 


NEUROGENIC LOCUS NOTCH 


1.32e 


28 


8 


198 


24 


5 


2531 


NTClJOUSE 


NEUROGENIC LOCUS NOTCH 


1.33e 


27 


9 


198 


24 


5 


2531 


NTCLRAT 


NEUROGENIC LOCUS NOTCH 


1.33e 


27 


10 


195 


24 


2 


1429 


LI12_CAEEL 


LIN-12 PROTEIN PRECURS 


7.46e 


27 


11 


193 


23 


9 


570 


FBP3.STRPU 


FIBROPELLIN C PRECURSO 


2.35e 


26 


12 


192 


23 


8 


2437 


NOTC.BRARE 


NEUROGENIC LOCUS NOTCH 


4.16e 


26 


13 


190 


23 


5 


1408 


SERR.DROME 


SERRATE PROTEIN PRECUR 


1.30e 


25 


14 


188 


23 


3 


722 


DLLlJiOUSE 


DELTA- LIKE PROTEIN 1 P 


4.08e 


25 


15 


188 


23 


3 


723 


DLLlJUMAN 


DELTA- LIKE PROTEIN 1 P 


4.08e 


25 


16 


185 


22 


9 


714 


DLLlJttT 


DELTA-LIKE PROTEIN 1 P 


2 . 24e 


24 


17 


185 


22 


9 


1964 


NTC4JOUSE 


NEUROGENIC LOCUS NOTCH 


2.24e 


24 


18 


184 


22 


8 


385 


DLKJOUSE 


DELTA-LIKE PROTEIN PRE 


3.95e 


24 


19 


175 


21 


7 


4393 


PGBMJUMAN 


BASEMENT MEMBRANE -SPEC 


6.27e 


22 


20 


171 


21 


2 


492 


FA10JOVIN 


COAGULATION FACTOR X P 


5.84e 


21 


21 


171 


21 


2 


880 


DL.DROME 


NEUROGENIC LOCUS DELTA 


5.84e 


21 


22 


171 


21 


2 


4543 


LRP1_CHICK 


LOW-DENSITY LIPOPROTEI 


5.84e 


21 


23 


165 


20 


4 


475 


FA10_CHICK 


COAGULATION FACTOR X P 


1.62e 


19 



24 


' 164 


20,3 


383 


DLKJUMAN 


DELTA-LIKE PROTEIN PRE 


2.80e 


19 


25 


163 


20.2 


3051 


YNX3.CAEEL 


HYPOTHETICAL PROTEIN T 


4,86e 


19 


26 


162 


20.1 


1295 


GLPl.CAEEL 


GLP-1 PROTEIN PRECURSO 


8.41e 


19 


27 


154 


19.1 


444 


FA7_RABIT 


COAGULATION FACTOR VII 


6.55e 


17 


28 


152 


18.8 


461 


FA9JUMAN 


COAGULATION FACTOR IX 


1.93e 


16 


29 


152 


18.8 


488 


FA10JUMAN 


COAGULATION FACTOR X P 


1.93e 


16 


30 


150 


18.6 


407 


FA7JOVIN 


COAGULATION FACTOR VII 


5.64e 


16 


31 


150 


18.6 


3562 


PGCV.CHICK 


VERSICAN CORE PROTEIN 


5.64e 


16 


32 


148 


18.3 


409 


MFGMJIG 


MILK FAT GLOBULE-EGF F 


1.64e 


15 


33 


147 


18.2 


427 


MFGMJOVIN 


MILK FAT GLOBULE-EGF F 


2.80e 


15 


34 


146 


18.1 


3707 


PGBMJtOUSE 


BASEMENT MEMBRANE-SPEC 


4.77e 


15 


35 


144 


17.8 


643 


UROMJOVIN 


UROMODULIN PRECURSOR ( 


1.38e 


14 


36 


144 


17.8 


1257 


PGCNJAT 


NEUROCAN CORE PROTEIN 


1.38e 


14 


37 


144 


17.8 


1268 


PGCNJIOUSE 


NEUROCAN CORE PROTEIN 


1.38e 


14 


38 


143 


17.7 


4544 


LRPlJUMAN 


LOW-DENSITY LIPOPROTEI 


2.34e 


14 


39 


142 


17.6 


466 


FA7JUMAN 


COAGULATION FACTOR VII 


3.96e 


14 


40 


142 


17.6 


2871 


FBNlJOUSE 


FIBRILLIN 1 PRECURSOR. 


3.96e 


14 


41 


142 


17.6 


3358 


PGCVJiOUSE 


VERSICAN CORE PROTEIN 


3.96e 


14 


42 


142 


17.6 


3396 


PGCVJUMAN 


VERSICAN CORE PROTEIN 


3.96e 


14 


43 


141 


17.5 


416 


FA9JOVIN 


COAGULATION FACTOR IX 


6.71e 


14 


44 


141 


17.5 


610 


LEM2JUMAN 


E-SELECTIN PRECURSOR ( 


6.71e 


14 


45 


141 


17.5 


2871 


FBNl_BOVIN 


FIBRILLIN 1 PRECURSOR 


6.71e 


14 



RESULT 
ID 
AC 
DT 
DT 
DT 



1 



STANDARD; PRT; 1480 AA. 

P24014; 

01-MAR-1992 (REL. 21, CREATED) 
01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 
01-FEB-1996 (REL, 33, LAST ANNOTATION UPDATE) 
SLIT PROTEIN PRECURSOR. 
SLI. 

DROSOPHILA MELANOGASTER (FRUIT FLY) . 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
[I] 

SEQUENCE FROM N.A. 
MEDLINE; 91099665. 

ROTHBERG J.M., JACOBS J.R., GOODMAN C.S., ARTAVANIS -TSAKONAS S.; 
"Slit: an extracellular protein necessary for development of midline 
glia and commissural axon pathways contains both EGF and LRR 
domains,"; 

GENES DEV. 4:2169-2187(1990). 

-!- FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

MATRIX MOLECULES. 
■!- TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 

EVENTUALLY DISTRIBUTED ALONG THE AXONS. 
-!- ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 

BY 11 AA AT THE C-TERMINUS OF THE LAST EGF REPEAT. 
•!- SIMILARITY: CONTAINS 7 EGF -LIKE DOMAINS. 

■!- SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
MANY PROTEINS. NUMBER IN THIS PROTEIN: 22. TWO BLOCK OF 6 LRR'S 
AND TWO BLOCKS OF 5 LRR'S, 

-!- SIMILARITY: CONTAINS A C-TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

This SWISS -PROT entry is copyright. It is produced through a collaboration 
between the Swiss institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www, isb-sib.cn/announce/ 
or send an email to license?isb-sib.ch) . 

EMBL; X53959; G8615; -. 
PIR; A36665; A36665. 
FLYBASE; FBgn0003425; sli. 
PROSITE; PS00010; ASXJYDROXYL; 3. 
PROSITE; PS00022; EGF J; 7, 
PROSITE; PS01185; CTCKJ; 1. 
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DR PROSITE; PS01186; EGFJ; 5. 

DR PROSITE; PS01187; EGF CA; 2. 

DR PROSITE; PS01225; CTCK 2; 1, 

DR PFAM; PF00007; Cysjnot; 1, 

DR PFAM; PFO0QO8; EGF; 7, 

DR PFAM; PF00054; laminin.G; 1, 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUC I NE - REPEAT ; DUPLICATION. 



FT 


SIGNAL 


1 


36 




FT 


CHAIN 


37 


1480 


SLIT PROTEIN. 


FT 


DOMAIN 


70 


104 


CONSERVED N-FLANKING REGION OF THE LRR. 


FT 


DOMAIN 


105 


230 


LEUCINE-RICH REPEATS (1ST REGION). 


FT 


DOMAIN 


231 


294 


rON'SFRVPn f 1 -FLANK TNG RFGTAN AF THF TRR 


FT 


DOMAIN 


295 


326 


CONSERVED N"FLANKING REGION OF THE LRR 


FT 


DOMAIN 


327 


452 


LFTlPTMP-DTfU DFDFfiTC /IMTl t>WTAW\ 


FT 


DOMAIN 


453 


518 


rfiWCFBVETl n-PT AMITTMr DPPTAM AF TUP TDD 


FT 


DOMAIN 


519 


550 


rnwcpDvirn m-piivvth^ oprTAM ap tup tdd 


FT 


DOMAIN 


551 


653 


LEUCINE-RICH REPEATS (3RD REGION). 


1 


DOMAIN 


654 


714 


fON^RRVFT) r-FI.ANKTNf; RFGTAN AP TUP LRR 




DOMAIN 


715 


746 


CONSERVED N-FLANKING REGION OF THE LRR. 


J 


DOMAIN 


747 


848 


LEUCINE-RICH REPEATS (4TH REGION). 


FT 


DOMAIN 


849 


910 


CONSERVED C "FLANKING REGION OF THE LRR. 


FT 


REPEAT 


105 


115 


LRR 1-1. 


FT 


REPEAT 


116 


139 


LRR 1-2. 


FT 


REPEAT 


140 


163 


LRR 1-3. 


FT 


REPEAT 


164 


187 


LRR 1-4. 


FT 


REPEAT 


188 


211 


LRR 1-5. 


FT 


REPEAT 


212 


230 


LRR 1-6. 


FT 


REPEAT 


327 


337 


LRR 2-1. 


FT 


REPEAT 


338 


361 


LRR 2-2. 


FT 


REPEAT 


362 


385 


LRR 2-3. 


FT 


REPEAT 


386 


409 


LRR 2-4. 


FT 


REPEAT 


410 


433 


LRR 2-5. 


FT 


REPEAT 


434 


452 


LRR 2-6. 


FT 


REPEAT 


551 


562 


LRR 3-1. 


FT 


REPEAT 


563 


586 


LRR 3-2, 


FT 


REPEAT 


587 


610 


LRR 3-3. 


FT 


REPEAT 


611 


634 


LRR 3-4. 


FT 


REPEAT 


635 


653 


LRR 3-5. 


FT 


REPEAT . 


747 


757 


LRR 4-1, 


FT 


REPEAT 


758 


781 


LRR 4-2. 


FT 


REPEAT 


782 


805 


LRR 4-3. 


FT 


REPEAT 


806 


829 


LRR 4-4, 


FT 


REPEAT 


830 


848 ■ 


LRR 4-5, 


FT 


DOMAIN 


907 


944 


EGF-LIKE 1. 


FT 


DOMAIN 


946 


983 


EGF-LIKE 2, 


JT 


DOMAIN 


985 


1022 


EGF-LIKE 3, CALCIUM- BINDING (POTENTIAL) 


i 


DOMAIN 


1024 


1062 


EGF-LIKE 4, 


1 


DOMAIN 


1064 


1100 


EGF-LIKE 5, CALCIUM- BINDING (POTENTIAL) 




DOMAIN 


1111 


1149 


EGF-LIKE 6, 


FT 


DOMAIN 


1353 


1392 


EGF-LIKE 7, 


FT 


DOMAIN 


1409 


1480 


CTCK, 


FT 


VARSPLIC 


1394 


1404 


MISSING (IN SHORT FORM). 


FT 


CARBOHYD 


111 


111 


POTENTIAL. 


FT 


CARBOHYD 


207 


207 


POTENTIAL. 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


FT 


CARBOHYD 


435 


435 


POTENTIAL. 


FT 


CARBOHYD 


783 


783 


POTENTIAL. 


FT 


CARBOHYD 


788 


788 


POTENTIAL. 


FT 


CARBOHYD 


958 


958 


POTENTIAL. 


FT 


CARBOHYD 


998 


998 


POTENTIAL. 


FT 


CARBOHYD 


1060 


1060 


POTENTIAL. 


FT 


CARBOHYD 


1159 


1159 


POTENTIAL. 


FT 


CARBOHYD 


1175 


1175 


POTENTIAL. 


FT 


CARBOHYD 


1243 


1243 


POTENTIAL. 


FT 


CARBOHYD 


1292 


1292 


POTENTIAL. 


FT 


DISULFID 


911 


922 


BY SIMILARITY. 


FT 


DISCJLFID 


916 


932 


BY SIMILARITY. 


FT 


DISULFID 


934 


943 


BY SIMILARITY. 


FT 


DISULFID 


950 


961 


BY SIMILARITY. 


FT 


DISULFID 


955 


971 


BY SIMILARITY, 



FT 


DISULFID 


973 


982 


BY 


tJTMTLARTTY 


FT 


DISULFID 


989 


1001 




CTMTr.lBTTV 


FT 


DISULFID 


995 


1010 


RV 


QTMTTARTTV 


FT 


DISULFID 


1012 


1021 




CTUTT ARTTV 


FT 


DISULFID 


1028 


1041 


BY 


5TMTLARTTY 


FT 


DISULFID 


1035 


1050 


BY 


SIMILARITY. 


FT 


DISULFID 


1052 


1061 


BY 


SIMILARITY. 


FT 


DISULFID 


1068 


1079 


BY 


<!TMTLARTTY 


FT 


DISULFID 


1073 


1088 


BY 




FT 


DISULFID 


1090 


1099 


BY 


OTMTJ 1RTTV 


FT 




1115 


1125 




CTVTT 1DTTV 


FT 


DISULFID 


1120 


1137 


BY 


CTMTT &DTTV 


FT 


DISULFID 


1139 


1148 


BY 


SIMILARITY, 


FT 


DISULFID 


1357 


1368 


BY 


SIMILARITY. 


FT 


DISULFID 


1362 


1380 


BY 


SIMILARITY. 


FT 


DISULFID 


1382 


1391 


BY 


SIMILARITY. 


FT 


DISULFID 


1409 


1443 


BY 


SIMILARITY. 


FT 


DISULFID 


1423 


1457 


BY 


SIMILARITY. 


FT 


DISULFID 


1434 


1473 


BY 


SIMILARITY. 


FT 


DISULFID 


1438 


1475 


BY 


SIMILARITY. 


FT 


DISULFID 


1442 


1479 


BY 


SIMILARITY. 


SQ 


SEQUENCE 


1480 AA; 165752 MW; 


2CD1C421 CRC32; 



Query Match 42.54; Score 343; DB 1; Length 1480; 

Best Local Similarity 42,24; Pred. No. 7.90e-66; 

Matches 46; Conservative 26; Mismatches 31; Indels 6; Gaps 5; 

Db 1064 NIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMYPQTSPCQNHECKHG 1123 

I III I : I : : I : III :| I I l|: ::| :|| I :|: Mill: || :| 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCE-HPP-PMVLLQTSPCDQYECQNG 58 

Db 1124 V-CFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTR 1171 

I: I : III ll::| =11 I :::|| ::|:||| : : I 
Qy 59 AQC I WQ - QEPT - - CRC PPGFAGPRCEKLITVNFVGKDS YVELAS AKVR 104 



RESULT 2 

ID NTC1JUMAN STANDARD; PRT; 2444 AA. 

AC P46531; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL, 33, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1 PRECURSOR (TRANSLOCATION- 

DE ASSOCIATED NOTCH PROTEIN TAN- 1) (FRAGMENT). 

GN NOTCHl OR TANl. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRA! A; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91347367. 

RA ELLISEN L.W., BIRD J., WEST D.C., SORENG A.L., REYNOLDS T.C, 

RA SMITH S.D., SKLAR J.; 

RT "TAN-1, the human homolog of the Drosophila notch gene, is broken by 

RT chromosomal translocations in T lymphoblastic neoplasms . " ; 

RL CELL 66:649-661(1991). 

CC -!- FUNCTION: MAY BE IMPORTANT FOR NORMAL LYMPHOCYTE FUNCTION. IN 

CC ALTERED FORM, MAY CONTRIBUTE TO TRANSFORMATION OR PROGRESSION 

CC IN SOME T-CELL NEOPLASMS. 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC ■!- TISSUE SPECIFICITY: IN FETAL TISSUES MOST ABUNDANT IN SPLEEN, 

CC BRAIN STEM AND LUNG. ALSO PRESENT IN MOST ADULT TISSUES WHERE IT 

CC IS FOUND MAINLY IN LYMPHOID TISSUES. 

CC •!• SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC -!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC' -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC - 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the embl outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 
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cc 


entities requires a license 


agreement (See http://ww.isb-sib.ch/announce/ 


FT 


DOMAIN 


2404 


2407 


POLY-GLN. 


cc 
cc 


or send 


an email to license@isb-sib.ch). 


FT 
FT 


DOMAIN 
DISULFID 


2411 
24 


2418 
37 


POLY-PRO. 










BY SIMILARITY. 


DR 


EMBL; M73980; G338675; *. 




FT 


DISULFID 


31 


46 


BY SIMILARITY. 


DR 


MIM; 190198; -. 






FT 


DISULFID 


48 


57 


BY SIMILARITY, 


DR 


PROSITE; PS00010; ASXJYDROXYL; 20. 


FT 


DISULFID 


63 


74 


BY SIMILARITY. 


DR 


PROSITE; PS00022; 


EGFJL; 34. 




FT 


DISULFID 


68 


87 


BY SIMILARITY, 


DR 


PROSITE; PS01186; 


EGF_2; 26. 




FT 


DISULFID 


89 


98 


BY SIMILARITY. 


DR 


PROSITE; PS01187; 


EGF.CA; 18. 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


DR 


PFAM; PF 


00008; EG 


F; 35. 




FT 


DISULFID 


111 


127 


BY SIMILARITY. 


DR 


PFAM; PF0Q023; ank; 6. 




FT 


DISULFID 


129 


138 


BY SIMILARITY. 


DR 


PFAM; PF00066; notch; 3, 




FT 


DISULFID 


144 


155 


BY SIMILARITY. 


DR 


HSSP; P00740; 1IXA. 




FT 


DISULFID 


149 


164 


BY SIMILARITY. 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


FT 


DISULFID 


166 


175 


BY SIMILARITY, 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


FT 


DISULFID 


182 


195 


BY SIMILARITY. 


FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


DISULFID 


189 


204 


BY SIMILARITY. 




CHAIN 


19 


>2444 


NEUROGENIC LOCUS NOTCH PROTEIN HOKOLOG 1. 


FT 


DISULFID 


206' 


215 


BY SIMILARITY. 




DOMAIN 


19 


1736 


EXTRACELLULAR (POTENTIAL) , 


FT 


DISULFID 


222 


233 


BY SIMILARITY. 




TRANSMEM 


1737 


1757 


POTENTIAL. 


FT 


•DISULFID 


227 


243 


BY SIMILARITY. 
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DOMAIN 


1758 


>2444 


CYTOPLASMIC (POTENTIAL). 


FT 


DISULFID 


245 


254 


BY SIMILARITY. 
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DOMAIN 


20 


58 


EGF-LIKE 1, 


FT 


DISULFID 


261 


272 


BY SIMILARITY. 
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DOMAIN 


59 


99 


EGF-LIKE 2, 


FT 


DISULFID 


266 


281 


BY SIMILARITY. 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3, 


FT 


DISULFID 


283 


292 


BY SIMILARITY. 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4 , 


FT 


DISULFID 


299 


312 


BY SIMILARITY. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


306 


321 


BY SIMILARITY. 
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DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DISULFID 


323 


332 


BY SIMILARITY. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 


339 


350 


BY SIMILARITY. 
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DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL) , 


FT 


DISULFID 


344 


359 


BY SIMILARITY. 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


361 


370 


BY SIMILARITY. 
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DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DISULFID 


376 


387 


BY SIMILARITY, 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


381 


398 


BY SIMILARITY. 
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DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


400 


409 


BY SIMILARITY. 
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DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


416 


429 


BY SIMILARITY. 
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DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) , 


FT 


DISULFID 


423 


438 


BY SIMILARITY, 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


440 


449 


BY SIMILARITY. 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


456 


467 


BY SIMILARITY. 
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DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


461 


476 


BY SIMILARITY. 
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DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 


478 


487 


BY SIMILARITY. 
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DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) . 
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DISULFID 


494 


505 


BY SIMILARITY. 
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DOMAIN 


753 


789 


EGF-LIKE 20. 
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DISULFID 
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514 


BY SIMILARITY. 
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DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 
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525 


BY SIMILARITY. 
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829 


868 


EGF-LIKE 22. 
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DISULFID 


532 


543 


BY SIMILARITY. 
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DOMAIN 


870 


906 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 


537 


552 


BY SIMILARITY. 
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908 


944 


EGF-LIKE 24. 
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DISULFID 


554 


563 


BY SIMILARITY. 
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DOMAIN 


946 


982 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 
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570 


580 


BY SIMILARITY. 
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DOMAIN 


984 


1020 


EGF-LIKE 26. 
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DISULFID 


575 


589 


BY SIMILARITY. 


1 


DOMAIN 


1022 


1058 


EGF-LIKE 27. 
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DISULFID 


591 


600 


BY SIMILARITY. 




DOMAIN ■ 


1060 


1096 


EGF-LIKE 28. 
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DISULFID 


607 


618 


BY SIMILARITY. 


FT 


DOMAIN 


1098 


1144 


EGF-LIKE 29. 
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DISULFID 


612 


627 


BY SIMILARITY. 
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DOMAIN 


1146 


1182 


EGF-LIKE 30. 
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DISULFID 


629 


638 


BY SIMILARITY. 


FT 


DOMAIN 


1184 


1220 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 
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DISULFID 


645 


655 


BY SIMILARITY. 
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DOMAIN 


1222 


1266 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 
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650 
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BY SIMILARITY. 
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1268 


1306 


EGF-LIKE 33. 
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DISULFID 


666 


675 


BY SIMILARITY. 
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1308 


1347 


EGF-LIKE 34. 
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DISULFID 


682 


693 


BY SIMILARITY. 
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DOMAIN 


1349 


1385 


EGF-LIKE 35. 
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DISULFID 


687 


702 


BY SIMILARITY. 
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DOMAIN 


1388 


1427 


EGF-LIKE 36. 
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DISULFID 
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713 


BY SIMILARITY. 
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DOMAIN 


1446 
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3 X LIN/NOTCH REPEATS. 


FT 
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BY SIMILARITY. 
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REPEAT 


1446 


1481 


LIN/NOTCH 1. 
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DISULFID 


725 


739 


BY SIMILARITY. 
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REPEAT 


1482 


1523 


LIN/NOTCH 2. 


FT 


DISULFID 


741 


750 


BY SIMILARITY. 
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REPEAT 


1524 
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LIN/NOTCH 3. 
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DISULFID 
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768 
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6 X ANK MOTIF REPEATS. 
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DISULFID 
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777 


BY SIMILARITY. 
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REPEAT 


1876 


1921 


ANK MOTIF 1. 
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DISULFID 


779 


788 


BY SIMILARITY. 


FT 


REPEAT 


1923 


1954 


ANK MOTIF 2. 
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DISULFID 
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BY SIMILARITY. 


FT 


REPEAT 
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1987 


ANK MOTIF 3. 
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DISULFID 
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815 


BY SIMILARITY. 
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REPEAT 
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2021 


ANK MOTIF 4. 
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DISULFID 


817 


826 


BY SIMILARITY. 
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REPEAT 


2023 


2054 


ANK MOTIF 5. 
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DISULFID 


833 


844 


BY SIMILARITY. 
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REPEAT 


2056 


2087 


ANK MOTIF 6. 
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DISULFID 


838 


855 


BY SIMILARITY. 


FT 
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1576 


1579 


POLY-VAL. 


FT 


DISULFID 


857 


867 


BY SIMILARITY.' 


FT 


DOMAIN 


1662 


1665 


POLY-ARG. 
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DISULFID 
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885 


BY SIMILARITY. 
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DOMAIN 


1729 


1732 


POLY -PRO. 
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DISULFID 


879 


894 


BY SIMILARITY. 
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DOMAIN 


1741 


1744 


POLY -ALA. 
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DISULFID 


896 


905 


BY SIMILARITY. 
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DOMAIN 


1902 
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POLY -GUI. 
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DISULFID 


912 


923 


BY SIMILARITY. 
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POLY-GLY. 
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BY SIMILARITY. 
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943 


BY SIMILARITY. 
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BY SIMILARITY. 
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DISULFID 


993 
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BY SIMILARITY. 
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DISULFID 


1010 
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BY SIMILARITY. 
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DISULFID 


1026 


1037 


BY SIMILARITY, 
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DISULFID 


1031 


1046 


BY SIMILARITY, 
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DISULFID 


1048 


1057 


BY SIMILARITY. 
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DISULFID 


1064 


1075 


BY SIMILARITY. 
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DISULFID 


1069 


1084 


BY SIMILARITY. 
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DISULFID 


1086 


1095 


BY SIMILARITY. 
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DISULFID 


1102 
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BY SIMILARITY. 
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DISULFID 


1117 


1132 


BY SIMILARITY, 


FT 


DISULFID 


1134 


1143 


BY SIMILARITY. 


FT 


DISULFID 


1150 


1161 


BY SIMILARITY, 


FT 


DISULFID 


1155 


1170 


BY SIMILARITY. 


FT 


DISULFID 


1172 


1181 


BY SIMILARITY. 


FT 


DISULFID 


1188 
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BY SIMILARITY. 


FT 


DISULFID 


1193 


1208 


BY SIMILARITY, 



Note: remainder of annotations omitted, 



t 



|uery Match • 26.6%; Score 215; DB 1; Length 2444; 

:st Local Similarity 41.0%; Pred. No. 6.84e-32; 
Matches 34; Conservative 18; Mismatches 24; Indels 7; Gaps 4; 



Db 948 NEC ASDPCRNGANCTDCVDS YTCTCPAGFSG IHCENNT PDCT -ESS-C--F — NGGT CV 1000 

"I : Ihll I I |::||| II III: ||: I ::| | : ||: |: 
0y 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 1001 DGINSFTCLCPPGFTGSYCQHW 1023 

II HUN: I: : ' 
Qy 63 WQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 3 

ID FBP1JTRPU STANDARD; . PRT; 1064 AA. 
AC P10079; 

DT 01-MAR-1989 (REL. 10, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN I PRECURSOR (EPIDERMAL GROWTH FACTOR-RELATED PROTEIN 1) 

DE (UEGF-1). 

GN EGF1, 

OS STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN) . 
OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 
OC EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROTIDAE; 
OC STRONGYLOCENTROTUS. 
[1] 

SEQUENCE FROM N,A, 
MEDLINE; 90112459, 

DELGADILLO-REYNOSO M.G., ROLLO D.R., HURSH D.A., RAFF R.A.; 

"Structural analysis of the uEGF gene in the sea urchin 
strongylocentrotus purpuratus reveals more similarity to vertebrate 
than to invertebrate genes with EGF-like repeats,"; 
J. MOL. EVOL. 29:314-327(1989) . 
[2] 

SEQUENCE OF 279-476 AND 781-1064 FROM N,A. 
MEDLINE; 87319677. 

HURSH D.A., ANDREWS M.E,, RAFF R.A.; 
"A sea urchin gene encodes a polypeptide homologous to epidermal 
growth factor."; 
SCIENCE 237:1487-1490(1987). 
[3] 

AVIDIN-LIKE DOMAIN. 
MEDLINE; 89196806. 
HUNT L.T., BARKER W.C.; 

"Avidin-like domain in an epidermal growth factor homolog from a sea 
urchin."; 

FASEB J. 3:1760-1764(1989)-. 
[4] 

CHARACTERIZATION. 
MEDLINE; 91285254. 

BISGROVE B.W,, ANDREWS M.E. / RAFF R.A.; 



RT 
RT 
RL 



RT "Fibropellins, products of an EGF repeat-containing gene, form a 

RT unique extracellular matrix structure that surrounds the sea urchin 

RT embryo,"; 

RL DEV. BIOL, 146:89-99(1991), 

CC -!- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
CC MATRIX. 

CC -!• SUBCELLULAR LOCATION: EXTRACELLULAR. IN VESICLES IN THE CYTOPLASM 
CC OF UNFERTILIZED EGGS, THEN TO THE BASE OF THE HYALIN LAYER 
CC THROUGHOUT DEVELOPMENT AND FINALLY IN THE APICAL LAMINA IN LATE 
CC EMBRYOS AND EARLY LARVAE. 

CC -!■ DEVELOPMENTAL STAGE: MODERATE LEVELS IN UNFERTILIZED EGGS AND 
CC DURING EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN 

CC . LATE MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS 
CC MAINTAINED THROUGH SUBSEQUENT STAGES, EXPRESSED BOTH MATERNALLY 
CC AND ZYGOTICALLY. 

CC -!- ALTERNATIVE PRODUCTS: TWO FORMS (IA AND IB) ARE PRODUCED BY 
CC ALTERNATIVE SPLICING, THE SMALL FORM (IB) LACKS 8 EGF REPEATS. 

CC -!■ SIMILARITY: CONTAINS 21 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 1 CUB DOMAIN. 

CC -I- SIMILARITY: THE C "TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
CC TO AVIDIN/STREPTAVIDIN. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC' or send an email to license@lsb-sib.ch) . 

CC 

DR EMBL; L08692; G161467; -. 

DR EMBL; L08692; G161466; -. 

DR EMBL; X17530; 6667061; -. ■ 

DR EMBL; M17421; G552260; -. 

DR EMBL; X17533; G667062; -. 

DR PIR; A29316; A29316. 

DR PROSITE; PS00010; ASX.HYDROXYL; 19. 

DR PROSITE; PS00022; EGF.l; 19. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01180; CUB; 1. 

DR PROSITE; PS01186; EGFJ; 19. 

DR PROSITE; PS01187; EGF_CA; 19. 

DR PFAM; PF00008; EGF; 21. 

DR PFAM; PF00431; CUB; 1. 

DR HSSP; P01132; 1EPH. 

KW BIOTIN; ALTERNATIVE SPLICING; EGF-LIKE DOMAIN; REPEAT; SIGNAL; 

KW GLYCOPROTEIN. 



FT 


SIGNAL 


1 


19 


POTENTIAL. 
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CHAIN 


20 


1064 


FIBROPELLIN I. 


FT 


DOMAIN 


20 


55 


EGF-LIKE 1. 


FT 


DOMAIN 


62 


175 


CUB. 


FT 


DOMAIN 


176 


212 


EGF-LIKE 2, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


442 


478 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


480 


516 


EGF-LIKE 10, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


518 


554 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


556 


592 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 
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DOMAIN 


594 


630 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


632 


668 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


670 


706 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 
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DOMAIN 


708 


744 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


746 


782 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 
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DOMAIN 


784 


820 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


822 


858 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


860 


896 


EGF-LIKE 20. 


FT 


DOMAIN 


898 


934 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


936 


1064 


AVIDIN-LIKE. 


FT 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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BY SIMILARITY. 
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314 


BY SIMILARITY. 
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325 


BY SIMILARITY. 
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343 


BY SIMILARITY. 
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352 


BY SIMILARITY, 


FT 


DISULFID 
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BY SIMILARITY. 


il 


DISULFID 
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381 


BY SIMILARITY. 
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DISULFID 


375 


390 


BY SIMILARITY. 
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392 


401 


BY SIMILARITY. 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 
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413 


428 


BY SIMILARITY, 
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DISULFID 
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439 


BY SIMILARITY. 
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457 


BY SIMILARITY. 
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451 


466 


BY SIMILARITY. 


FT 


DISULFID 


468 


477 


BY SIMILARITY. 
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484 
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BY SIMILARITY. 
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504 


BY SIMILARITY. 
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515 


BY SIMILARITY. 
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533 


BY SIMILARITY. 
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527 


542 


BY SIMILARITY. 
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544 


553 


BY SIMILARITY. 
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560 


571 


BY SIMILARITY. 
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DISULFID 


565 


580 


BY SIMILARITY. 


FT 


DISULFID 


582 


591 


BY SIMILARITY. 


FT 


DISULFID 


598 


609 


BY SIMILARITY. 


FT 
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SEQUENCE 
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Query Match 26.5*; Score 214; DB 1; Length 1064; 

Best Local Similarity 42,7%; Pred. No. 1.23e-31; 

Matches 35; Conservative 13; Mismatches 27; Indels 7; 



822 NIDECASDPCLNGGICVDGVNGFVCQCPPNYSGTYCE ISLDA- -CRSMPCQNGAT 874 

II:!: I :|: III III: I II :ll :|| : |:: I lllll 
1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

875 CVNVGADYVCECVPGYAGQNCE 896 

hi : II 11:11 II 
61 CIWQQEPTCRCPPGFAGPRCE 82 



RESULT 4 

NOTCJROME STANDARD; PRT; 2703 AA. 
P07207; P04154; 

Ql-NOV-1986 (REL. 03, CREATED) 
01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
NEUROGENIC LOCUS NOTCH PROTEIN PRECURSOR. 



DROSOPHILA MELANOGASTER (FRUIT FLY) . 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA ; 
DROSOPHILIDAE; DROSOPHILA. 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 86079539. 

WHARTON K.A., JOHANSEN K.M., XU T., ARTAVANI S -TSAKONAS S.; 

"Nucleotide sequence from the neurogenic locus notch implies a gene 
product that shares homology with proteins containing EGF-like 
repeats . " ; 

CELL 43:567-581(1985). 
[2] 

SEQUENCE FROM N.A. 

STRAIN-OREGON-R; 

MEDLINE; 87064624. 

KIDD S., KELLEY M.R., YOUNG M.W.; 

"Sequence of the notch locus of Drosophila melanogaster: relationship 
of the encoded protein to mammalian clotting and growth factors , " ; 
MOL. CELL. BIOL. 6:3094-3108(1986). 
[3] 

SEQUENCE OF 2505-2611 FROM N.A. 
MEDLINE; 85099329. 

WHARTON K.A., YEDVOBNICK B., FINNERTY V.G., ARTAVANIS -TSAKONAS S,; 
"opa: a novel family of transcribed repeats shared by the Notch locus 
and other developmental^ regulated loci in D. melanogaster."; 
CELL 40:55-62(1985). 
[4] 

SEQUENCE OF 1-8 FROM N.A. 
MEDLINE; 87257846. 

KELLEY M.R., KIDD S., BERG R.L., YOUNG M.W.; 

"Restriction of P-element insertions at the Notch locus of Drosophila 

melanogaster."; 

MOL. CELL. BIOL. 7:1545-1548(1987). 
[5] 

REVIEW. 
HARRIS W.A.; 

"Many cell types specified by Notch function."; 
CURR. BIOL. 1:120-122(1991). 

-!- FUNCTION: NOTCH PROTEIN IS ESSENTIAL FOR PROPER DIFFERENTIATION OF . 
ECTODERM. 

-!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

-!- SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 
OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 
THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 

-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

-!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 

-!- SIMILARITY: CONTAINS 6 ANK REPEATS, 

This SWISS-PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute, There are no restrictions on its 
use by non-profit institutions as' long as its content is in no way 
modified and this statement is not removed. Osage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
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CC or send an email to license?isb-sih.ch). 

cc 

DR EMBL; M16152; G157988; •. 

DR EMBL; M16153; G157988; JOINED. 

DR EMBL; M16149; G157988; JOINED. 

DR EMBL; M16150; G157988; JOINED, 

DR EMBL; M16151; G157988; JOINED. 

DR EMBL; K03508; G157993; -. 

DR EMBL; M13689; G157993; JOINED. 

DR EMBL; K03507 ; G157993; JOINED. 

DR EMBL; M12175; G950317; -. 

DR EMBL; M16025; G157995; -. 

DR PIR; A24420; A24420. 

DR PIR; A24768; A24768. 

DR PIR; A05267; A05267 . 

DR FLYBASE; FBgn0004647; N. 

DR PROSITE; PS00010; ASX HYDROXYL; 22. 

DR PROSITE; PS00022; EGFJ; 34, 

DR PROSITE; PS01186; EGFJ; 28. 

DR PROSITE; PS01187; EGF CA; 22. 

t, PFAM; PF00008; EGF; 36, 
PFAM; PF00023; ank; 6. 
PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 
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Note: remainder' of annotations omitted. 



Query Match 25.9%; Score 209; DB 1; Length 2703; 

Best Local Similarity 40.7%; Pred. No, 2,28e-30; 

Matches 33; Conservative 14; Mismatches 30; Indels 4; Gaps 2; 

Db. 293 NYDDCLGHLCQNGGTC I DG I SD YTC RC PPNFTGRFCQDD - ■ -VD-ECAQRDHPVCQNGAT 348 
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Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 349 CTNTHGSYSCICVNGWAGLDC 369 

I : :l I I II I 
Qy 61 CIWQQEPTCRCPPGFAGPRC 81 



STANDARD; PRT; 2139 AA. 



RESULT 5 
ID CRB DROM 
P10040; 

01-MAR-1989 (REL, 10, CREATED) 
01-MAY-1991 (REL. 18, LAST SEQUENCE UPDATE) 
15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 
CRUMBS PROTEIN PRECURSOR (95F). 
CRB. 

DROSOPHILA MELANOGASTER (FRUIT FLY). 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
[1] 

SEQUENCE FROM N.A. 
STRAIN-OREGON-R; TISSUE=EMBRYO; 
MEDLINE; 90263104. 
TEPASS 0., THERES C, KNUST E.; 

"Crumbs encodes an EGF-like protein expressed on apical membranes of 
Drosophila epithelial cells and required for organization of 
epithelia."; 
CELL 61:787-799(1990). 
[2] 

SEQUENCE OF 1663-1955 FROM N.A. 
MEDLINE; 87218537. 

KNUST E., DIETRICH U., TEPASS D., BREMER K.A., WEIGEL D., 
VAESSIN H., CAMPOS -ORTEGA J. A.; 

"EGF homologous sequences encoded in the genome of Drosophila 
melanogaster, and their relation to neurogenic genes/; 
EMBO J. 6:761-766(1987). 

•!- FUNCTION: MAY PLAY A ROLE IN THE DEVELOPMENT OF EPITHELIA, 
POSSIBLY FOR THE ESTABLISHMENT AND/OR MAINTENANCE OF CELL 
POLARITY. IT MAY ACT AS A SIGNAL. 
-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
-!- PTM: PHOSPHORYLATED IN THE CYTOPLASMIC DOMAIN (POTENTIAL). 
•!- SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS , 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation • 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 

EMBL; M33753; G552087; ALT SEQ, 
EMBL; X05144; E1746; -. 
EMBL; X05144; G929536; -. 
PIR; B26637; B26637. 
PIR; A35672; A35672. 
FLYBASE; FBgn0000368; crb. 
PROSITE; PS00010; ASX HYDROXYL; 15. 
PROSITE; PS00022; EGF_1; 26. 
PROSITE; PS01186; EGF_2; 17. 
PROSITE; PS01187; EGF.CA; 15. 
PFAM; PF00008; EGF; 26. 
PFAM; PF00054; laminin G; 3. 
HSSP; P00740; 1IXA. 

DIFFERENTIATION; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 
GLYCOPROTEIN; SIGNAL; PHOSPHORYLATION. 
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569 


BY SIMILARITY. 


FT 


DISULFID 


571 


580 


BY SIMILARITY. 


FT 


DISULFID 


586 


597 


BY SIMILARITY, 


FT 


DISULFID 


591 


602 


BY SIMILARITY. 


FT 


DISULFID 


604 


610 


BY SIMILARITY. 


FT 


DISULFID 


613 


624 


BY SIMILARITY. 


FT 


DISULFID 


618 


634 


BY SIMILARITY. 


FT 


DISULFID 


636 


645 


BY SIMILARITY. 


FT 


DISULFID 


652 


664 


BY SIMILARITY, 


FT 


DISULFID 


659 


673 


BY SIMILARITY. 


FT 


DISULFID 


675 


684 


BY SIMILARITY, 


FT 


DISULFID 


691 


702 


RY <?TMTT,ARTTY 

DL OXnXLtnRll 1 . 


FT 


DISULFID 


696 


711 


BY SIMILARITY, 


FT 


DISULFID 


713 


722 


BY SIMILARITY. 


FT 


DISULFID 


729 


740 


BY SIMILARITY, 


FT 


DISULFID 


734 


749 


BY SIMILARITY, 


FT 


DISULFID 


751 


760 


BY SIMILARITY. 


FT 


DISULFID 


767 


778 


BY SIMILARITY. 


FT 


DISULFID 


772 


787 


BY SIMILARITY. 


FT 


DISULFID 


789 


799 


BY SIMILARITY. 


FT 


DISULFID 


806 


817 


BY SIMILARITY. 


FT 


DISULFID 


811 


826 


BY SIMILARITY. 


FT 


DISULFID 


828 


837 


BY SIMILARITY. 


FT 


DISULFID 


844 


855 


BY SIMILARITY. 
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FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

•DISULFID 
DISULFID 
DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 
CARBOHYD 

K CARBOHYD 

Vp CARBOHYD 
CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 

FT CARBOHYD 



849 
892 
908 
913 
930 
946 
952 
968 
984 



890 
901 
919 
928 
939 
957 
966 
977 
995 
989 1009 
1011 1020 
1211 1222 
1216 1231 
1233 1242 
1485 1496 
1490 1505 
1507 1516 
1763 1774 
1768 1783 
1785 1794 
1801 1812 
1806 1821 
1823 1832 
1839 1850 
1844 1859 
1861 1870 
1878 1889 
1883 1903 
1905 1914 
1919 1930 
1924 1939 
1941 1950 
1957 1968 
1962 1977 
1979 1988 
1995 2008 
2002 2017 
2019 2028 
37 37 
96 96 
198 
238 
239 
336 
400 
550 
565 
736 
746 
860 



198 

238 

239 

336 

400 

550 

565 

736 

746 

860 

884 

976 976 
1102 1102 
1114 1114 
1138 1138 
1192 1192 
1245 1245 
1255 1255 
1354 1354 
1363 1363 
1441 1441 
1454 1454 



BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL, 
POTENTIAL. 
POTENTIAL, 
POTENTIAL, 
POTENTIAL. 
POTENTIAL, 
POTENTIAL. 
POTENTIAL, 
POTENTIAL. 
POTENTIAL. 
POTENTIAL. 
POTENTIAL, 
POTENTIAL. 
POTENTIAL, 
POTENTIAL. 
POTENTIAL. 



Note: remainder of annotations omitted. 

Query Match 25.4%; Score 205; DB 1; Length 2139; 

Best Local Similarity 34.9*; Pred. No. 2,33e-29; 

Matches • 29; Conservative 14; Mismatches 37; Indels 3; Gaps 3; 

Db 1835 NIDECADQPCHNGGNCTDLIASYVCDCPEDYMGPQCDVLKQMTC-ENEPCRNGSTCQNGF 1893 

I 1:1 : h:|: I I : : I ! 1 1 : : I I : I : || : MM 
Oy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPC-DQYECONGA 59 



Db 1894 N-ASTGNNFTCTCVPGFEGPLCD 1915 

: II I III II I: 
Qy 60 QCIWQQEPTCRCPPGFAGPRCE 82 



RESULT 6 

ID NOTCJENLA STANDARD; PRT; 2524 AA. 
AC P21783; 

DT 01-MAY-1991 (REL. 18, CREATED) 

DT 01-OCM996 (REL. 34, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG PRECURSOR (XOTCH PROTEIN). 

GN XOTCH, 

OS XENOPUS LAEVIS (AFRICAN CLAWED FROG). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 
OC MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS. 
RN [1] 

SEQUENCE FROM N.A. 
MEDLINE; 90385285. 
COFFMAN C, HARRIS W., KINTNER C; 
"Xotch, the Xenopus homolog of Drosophila notch."; 
SCIENCE 249:1438-1441(1990). 
[2] 

REVISIONS TO 1759-1782, 
KINTNER C; 

SUBMITTED (JUN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 
-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
■!• DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 
•I- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 
■I- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 
■I- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 
■I- SIMILARITY: CONTAINS 6 ANK REPEATS. 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed, Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch). 

EMBL; M33874; G1364263; -. 

PIR; A35844; A35844. 

PROSITE; PS00010; ASXJYDROXYL; 23. 

PROSITE; PS00022; E6F_1; 34, 

PROSITE; PS01186; EGF 2; 29, 

PROSITE; PS01187; EGF.CA; 21. 

PFAM; PF00008; EGF; 36. 

PFAM; PF00023; ank; 6. 

PFAM; PF00066; notch; 3. 

HSSP; P00740; 1IXA. 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 



KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


FT 


SIGNAL 


1 


19 


POTENTIAL. 


FT 


CHAIN 


20 


2524 


NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG, 


FT 


DOMAIN 


20 


1728 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1729 


1750 


POTENTIAL. 


FT 


DOMAIN 


1751 


2524 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


57 


EGF-LIKE 1. 


FT 


DOMAIN 


58 


99 


EGF-LIKE 2, 


FT 


DOMAIN 


102 


140 


EGF-LIKE 3. 


FT 


DOMAIN 


141 


177 


EGF-LIKE 4. 


FT 


DOMAIN 


179 


215 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


217 


254 


EGF-LIKE 6. 


FT 


DOMAIN 


256 


292 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


294 


332 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


371 


409 


EGF-LIKE 10. 


FT 


DOMAIN 


411 


449 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


451 


487 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


489 


525 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


527 


563 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


565 


600 


EGF-LIKE 15, CALCIUM- BINDING (POTENTIAL). 
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FT 


DOMAIN 


602 


638 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


640 


675 


EGF-LIKE 17. 


FT 


DOMAIN 


677 


713 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


715 


750 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


752 


788 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


790 


826 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


828 


866 


EGF-LIKE 22, 


FT 


DOMAIN 


868 


904 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


906 


942 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


944 


980 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


982 


1018 


EGF-LIKE 26, 


FT 


DOMAIN 


1020 


1056 


EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


1058 


1094 


EGF-LIKE 28. 


FT 


DOMAIN 


1096 


1142 


EGF-LIKE 29, 


FT 


DOMAIN 


1144 


1180 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1182 


1218 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL), 


JT 


DOMAIN 


1220 


1264 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


1 


DOMAIN 


1266 


1304 


EGF-LIKE 33. 


F 


DOMAIN 


1306 


1346 


EGF-LIKE 34, 


FT 


DOMAIN 


1347 


1383 


EGF-LIKE 35, 


FT 


DOMAIN 


1386 


1424 


EGF-LIKE 36. 


FT 


DOMAIN 


1441 


1560 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1441 


1478 


LIN/NOTCH 1. 


FT 


REPEAT 


1479 


1520 


LIN/NOTCH 2, 


FT 


REPEAT 


1521 


1560 


UN/NOTCH 3. 


FT 


DOMAIN 


1871 


2083 


6 X ANK MOTIF REPEATS. 


FT 


DISULFID 


22 


35 


BY SIMILARITY. 


FT 


DISULFID 


29 


45 


BY SIMILARITY. 


FT 


DISULFID 


47 


56 


BY SIMILARITY. 


FT 


DISDLFID 


62 


74 


BY SIMILARITY. 


FT 


DISDLFID 


68 


87 


BY SIMILARITY, 


FT 


DISULFID 


89 


98 


BY SIMILARITY. 


FT 


DISDLFID 


106 


117 


BY SIMILARITY, 


FT 


DISDLFID 


111 


128 


BY SIMILARITY. 


FT 


DISDLFID 


130 


139 


BY SIMILARITY. 


FT 


DISDLFID 


145 


156 


BY SIMILARITY. 


FT 


DISULFID 


150 


165 


BY SIMILARITY. 


FT 


DISDLFID 


167 


176 


BY SIMILARITY. 


FT 


DISDLFID 


183 


194 


BY SIMILARITY. 


FT 


DISULFID 


188 


203 


BY SIMILARITY. 


FT 


DISDLFID 


205 


214 


BY SIMILARITY. 


FT 


DISULFID 


221 


232 


BY SIMILARITY. 


FT 


DISULFID 


226 


242 


BY SIMILARITY, 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT 


DISULFID 


260 


271 


BY SIMILARITY. 


I 


DISDLFID 


265 


280 


BY SIMILARITY. 


1 


DISULFID 


282 


291 


BY SIMILARITY. 


Tt 


DISDLFID 


298 


311 


BY SIMILARITY, 


FT 


DISDLFID 


305 


320 


BY SIMILARITY. 


FT 


DISULFID 


322 


331 


BY SIMILARITY, 


FT 


DISULFID 


338 


349 


BY SIMILARITY, 


FT 


DISULFID 


343 


358 


BY SIMILARITY. 


FT 


DISDLFID 


360 


369 


BY SIMILARITY, 


FT 


DISULFID 


375 


386 


BY SIMILARITY, 


FT 


DISULFID 


380 


397 


BY SIMILARITY. 


FT 


DISULFID 


399 


408 


BY SIMILARITY. 


FT 


DISULFID 


415 


428 


BY SIMILARITY. 


FT 


DISULFID 


422 


437 


BY SIMILARITY, 


FT 


DISULFID 


439 


448 


BY SIMILARITY, 


FT 


DISULFID 


455 


466 


BY SIMILARITY, 


FT 


DISDLFID 


460 


475 


BY SIMILARITY, 


FT 


DISDLFID 


477 


486 


BY SIMILARITY, 


FT 


DISDLFID 


493 


504 


BY SIMILARITY, 


FT 


DISULFID 


498 


513 


BY SIMILARITY. 


FT 


DISULFID 


515 


524 


BY SIMILARITY. 


FT 


DISULFID 


531 


542 


BY SIMILARITY. 


FT 


DISULFID 


536 


551 


BY SIMILARITY, 


FT 


DISULFID 


553 


562 


BY SIMILARITY. 


FT 


DISULFID 


569 


579 


BY SIMILARITY. 


FT 


DISULFID 


574 


588 


BY SIMILARITY. 


FT 


DISULFID 


590 


599 


BY SIMILARITY. 


FT 


DISULFID 


606 


617 


BY SIMILARITY, 


FT 


DISULFID 


611 


626 


BY SIMILARITY, 



FT 


DISULFID 


628 


637 


BY SIMILARITY. 


FT 


DISULFID 


644 


654 


BY SIMILARITY. 


FT 


DISULFID 


649 


663 


BY SIMILARITY. 


FT 


DISULFID 


665 


674 


BY SIMILARITY. 


*FT 


DISULFID 


681 


692 


BY SIMILARITY, 


FT 


DISULFID 


686 


701 


BY SIMILARITY. 


FT 


DISULFID 


703 


712 


BY SIMILARITY. 


FT 


DISULFID 


719 


729 


BY SIMILARITY. 


FT 


DISULFID 


724 


738 


BY SIMILARITY. 


FT 


DISULFID 


740 


749 


BY SIMILARITY. 


FT 


DISULFID 


756 


767 


BY SIMILARITY. 


FT 


DISULFID 


761 


776 


BY SIMILARITY. 


FT 


DISULFID 


778 


787 


BY SIMILARITY. 


FT 


DISULFID 


794 


805 


BY SIMILARITY. 


FT 


DISULFID 


799 


814 


BY SIMILARITY. 


FT 


DISULFID 


816 


825 


BY SIMILARITY. 


FT 


DISULFID 


832 


843 


BY SIMILARITY. 


FT 


DISULFID 


837 


854 


BY SIMILARITY. 


FT 


DISULFID 


856 


865 


BY SIMILARITY, 


FT 


DISULFID 


872 


883 


BY SIMILARITY. 


FT 


DISULFID 


877 


892 


BY SIMILARITY. 


FT 


DISULFID 


894 


903 


BY SIMILARITY, 


FT 


DISULFID 


910 


921 


BY SIMILARITY. 


FT 


DISULFID 


915 


930 


BY SIMILARITY. 


FT 


DISULFID 


932 


941 


BY SIMILARITY. 


FT 


DISULFID 


986 


997 


BY SIMILARITY. 


FT 


DISULFID 


991 


1006 


BY SIMILARITY. 


FT 


DISULFID 


1008 


1017 


BY SIMILARITY. 


FT 


DISULFID 


1024 


1035 


BY SIMILARITY. 


FT 


DISULFID 


1029 


1044 


BY SIMILARITY. 


FT 


DISULFID 


1046 


1055 


BY SIMILARITY, 


FT 


DISULFID 


1062 


1073 


BY SIMILARITY. 


FT 


DISULFID 


1067 


1082 


BY SIMILARITY. 


FT 


DISULFID 


1084 


1093 


BY SIMILARITY. 


FT 


DISULFID 


1100 


1121 


BY SIMILARITY. 


FT 


DISULFID 


1115 


1130 


BY SIMILARITY. 


FT 


DISULFID 


1132 


1141 


BY SIMILARITY. 


FT 


DISULFID 


1148 


1159 


BY SIMILARITY. 


FT 


DISULFID 


1153 


1168 


BY SIMILARITY. 


FT 


DISULFID 


1170 


1179 


BY SIMILARITY, 


FT 


DISULFID 


1186 


1197 


BY SIMILARITY. 


FT 


DISULFID 


1191 


1206 


BY SIMILARITY. 


FT 


DISULFID 


1208 


1217 


BY SIMILARITY. 


FT 


DISULFID 


1224 


1243 


BY SIMILARITY. 


FT 


DISULFID 


1237 


1252 


BY SIMILARITY. 


FT 


DISULFID 


1254 


1263 


BY SIMILARITY. 


FT 


DISULFID 


1270 


1283 


BY SIMILARITY. 


FT 


DISULFID 


1275 


1292 


BY SIMILARITY. 


FT 


DISULFID 


1294 


1303 


BY SIMILARITY. 


FT 


DISULFID 


1310 


1321 


BY SIMILARITY. 


FT 


DISULFID 


1315 


1333 


BY SIMILARITY. 


FT 


DISULFID 


1335 


1344 


BY SIMILARITY. 


FT 


DISULFID 


1351 


1362 


BY SIMILARITY. 


FT 


DISULFID 


1356 


1371 


BY SIMILARITY. 


FT 


DISULFID 


1373 


1382 


BY SIMILARITY. 


FT 


DISULFID 


1390 


1401 


BY SIMILARITY. 


FT 


DISULFID 


1395 


1412 


BY SIMILARITY. 


FT 


DISULFID 


1414 


1423 


BY SIMILARITY. 


FT 


CARBOHYD 


462 


462 


POTENTIAL. 


FT 


CARBOHYD 


887 


887 


POTENTIAL. 



Note: remainder of annotations omitted. 

Query Match 25.4*; Score 205; DB 1; Length 2524; 

Best Local Similarity 42.5%; Pred. No, 2.33e-29; 

Matches 34; ■ Conservative 16; Mismatches 23; Indels 7; Gaps 4; 

Db 946 NEC AS NPC KNG ANC TDCVNS YTCTCQPGFSGI HCESNTPDCT -ESS-C--F — NGGTCI 998 

-I l::ll I I Ihlll I III: I | ::| I : ||: || 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 999 DGINTFTCQCPPGFTGSYCQ 1018 
11:11111:1: h 
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Qy 63 WQQEPTCRCPPGFAGPRCE 82 



RESULT 7 

ID NTC3JKXJSE STANDARD; PRT; 2318 AA, 

AC Q61982; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH 3 PROTEIN, 

GN NOTCH3. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-ICR X SWISS WEBSTER; 

RX MEDLINE; 95001556, 

RA LARDELLI M, , DALSTRAND J., LENDAHL U,; 

RT "The novel Notch homologue mouse Notch 3 lacks specific epidermal 

•growth factor-repeats and is expressed in proliferating 
neuroepithelial. 
MECH. DEV. 46:123-136(1994). 

CC -!■ FUNCTION: NOTCH 1, 2 AND 3 PLAY A COMBINATIONAL ROLE DURING 
CC VARIOUS CELL FATE DECISIONS AND MORPHOLOGICAL MOVEMENTS IN THE 
CC DEVELOPING CNS AND PROBABLY OTHER REGIONS OF THE EMBRYO. 

CC -!■ TISSUE SPECIFICITY: PROLIFERATING NEUROEPITHELIUM. 

CC •!• DEVELOPMENTAL STAGE: CNS DEVELOPMENT. 

CC -!■ SIMILARITY: CONTAINS 34 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS, 

cc - 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch), 

cc 

DR EMBL; X74760; G483581; -. 

DR MGD; MGI:99460; NOTCH 3 . 

DR PR0SITE; PS00010; ASXJYDROXYL; 18, 

DR PR0SITE; PS00022; EGF 1; 33. 

DR PROSITE; PS01186; EGF 2; 27. 

DR PROSITE; PS01187; EGF CA; 17. 

DR PFAM; PF00008; EGF; 33. 

DR PFAM; PF00023; ank; 6, 

DR PFAM; PF00066; notch; 3. 

m HSSP; P00740; 1IXA. 

■ DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; 

TW GLYCOPROTEIN. 



FT 


DOMAIN 


1 


1643 


EXTRACELLULAR. 


FT 


TRANSMEM 


1644 


1664 


POTENTIAL. 


FT 


DOMAIN 
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Note; remainder of annotations omitted. 

Query Match 25.0%; Score 202; DB 1; Length 2318; 

Best Local Similarity 39.8%; Pred. No. 1.32e-28; 

Matches 33; Conservative 19; Mismatches 23; Indels 8; Gaps 5; 

Db 510 DECASTPCRNGAKCVDQPDGYECRCAEGFEGTLCERN — VD-DCSP-DP- -CHHG-RCV 561 

1:1 : 11:1! Mi: :|| | :;' I :||: I : II I |::| :|: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 562 DGIASFSCACAPGYTGIRCESOV 584 

:| l:|l"l III : 
Qy 63 WQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 8 

ID NTC1JOUSE STANDARD; PRT; 2531 AA. 
AC Q01705; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCOS NOTCH HOMOLOG PROTEIN 1 PRECURSOR (MOTCH PROTEIN) . 

GN NOTCHl OR MOTCH, 

OS MUS MOSCULUS (MOOSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MOS. 
RN [1] 

RP SEQUENCE FROM N.A. 
RC TISSUE-EMBRYO; 
RX MEDLINE; 93194170. 

RA FRANCO DEL AMO F. ( GENDRON-MAGUIRE M, , SWIATEK P.J., JENKINS N.A. , 
RA COPELAND N.G., GRIDLEY T.; 

RT "Cloning, analysis, and chromosomal localization of Notch-1, a mouse 
RT homolog of Drosophila Notch."; 
RL GENOMICS 15:259-264(1993). 
RN [2] 

RP SEQOENCE OF 1551-2170 FROM N.A. 
RC TISSUE-EMBRYO; 
RX MEDLINE; 93048835. 

RA FRANCO DEL AMO F., SMITH D.E., SWIATEK P.J., GENDRON-MAGUIRE M, , 

RA GREENSPAN R.J., MCMAHON A. P., GRIDLEY T . ; 

RT "Expression pattern of Motch, a mouse homolog of Drosophila Notch, 

RT suggests an important role in early postimplantation mouse 

RT development."; 

RL DEVELOPMENT 115:737-744(1992). 

- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
■ DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 

- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 
-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 
-I- SIMILARITY: CONTAINS 6 ANK REPEATS. 
-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 



CC 
CC 
CC 
CC 
CC 
CC 
CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Osage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.cn/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; Z11886; G288503; -. 

DR MGD; MGI; 97363; NOTCHl, 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS00022; EGF J; 34. 

DR PROSITE; PS01186; EGF 2; 27. 

DR PROSITE; PS01187; EGF.CA; 21. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 
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SEQUENCE 


2531 AA; 271312 MW; AD71189B 



Query Match 24.5%; Score 198; DB 1; Length 2531; 

Best Local Similarity 40,0%; Pred. No. 1.33e-27; 

Matches 32; Conservative 19; Mismatches 22; Indels 7; Gaps 4; 

Db 947 NECASNPCQNGANCTDCVDS YTCTCPVGFNG IHCENNTPDCT -ESS-C* *F — NGGTCV 999 

::| :: h:|l I I h:|ll II ll:|: II: I ::| I : ||: |: 
Oy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 1000 DGINSFTCLCPPGFTGSYCQ 1019 

* II llllhl: I; 

m 63 WQQEPTCRCPPGFAGPRCE 82 



RESULT 9 

ID NTC1_RAT STANDARD; PRT; 2531 AA, 

AC Q07008; 

DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR. 

GN NOTCH1. 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS . 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=SCHWANN CELL; 

RX MEDLINE; 92111383. 

RA WEINMASTER G , , ROBERTS V.J. , LEMKE G . ; 

RT "A homolog of Drosophila Notch expressed during mammalian 

RT development."; 

RL DEVELOPMENT 113:199-205(1991). 

CC ■!- FUNCTION: REQUIRED FOR THE CORRECT DIFFERENTIATION OF A NUMBER 
CC OF TISSUES, 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- DEVELOPMENTAL STAGE: IN THE EMBRYO, HIGHEST LEVELS OCCUR BETWEEN 

CC DAYS 12 AND 14 AND DECREASE RAPIDLY TO MUCH LOWER LEVELS IN THE 
^C ADULT. 

-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

V •!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS, 

^C -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC ■!■ SIMILARITY; CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch) . 

CC 

DR EMBL; X57405; G57635; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS00022; EGF_1; 35. 

DR PROSITE; PS0U86; EGF_2; 26. 

DR PROSITE; PS01187; EGF_CA; 21. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 
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1010 


Vol LIKt Hi (.ALLIUM "BINDING (POTENiIAL) . 




DOMAIN 


1221 


1265 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 




nOM&TW 

uunnin 


1267 


1305 


Lbt LIKl jJ , 


FT 




1307 


1346 


the bltiCj J4 . 






1348 








DOMAIN 




1426 


l\it LIRt JO. 




bvPlnlK 


1449 


1462 


tio KItn. 


FT 


IYIMATW 


1865 


2076 


D A ANK MUiir KtrbAIs, 


FT 


REPEAT 


1865 


1910 


AMI? VflTTF 1 

aha nunr i. 


FT 


RPPPAT 


1912 


1942 


AMI? MfiTTP 0 
ANA MUIlr i, 




REPEAT 






RMV UATTD 1 

nali MUilr i. 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4, 


FT 


REPEAT 


2011 


2042 




FT 


REPEAT 


2044 


2076 


ANK MOTTP 


FT 


DISULFID 


24 


37 


RV QTMTT.ARTTY 


FT 


DISULFID 


31 


46 


fll SinibHKJ.II. 


FT 


nT<!nr,PTn 

UiOUbf 1U 


48 


57 


RV STMITAPTTY 


FT 


DISULFID 


63 


74 


RV QTMTTARTTV 
Dl Oinibnitll I < 


FT 


DISULFID 


68 


87 


RV CTMTTABTTV 
Dl OLPllunKLL I . 


FT 


DISULFID 


89 


98 


RV STMHARTTY 

Dl OlPllbnAll X » 




nTcnr.PTn 


106 


117 


CI SinibAKllX, 


PT 


nTQnTPTn 

UlOUJjf iu 


111 


127 


Dl SlMILAKIll. 






iiu 




Dl olMILAKIll. 




mcnr tin 




ikk 


Hi dIMILAKII I . 


J 


LlloUbrlU 


149 


164 


BY SIMILARITY. 




nTcnr pm 


166 




ol SlMILiAKITi . 




nrcni Pm 

UlOULllU 


182 


195 


ill iiMlLAKIll. 


FT 


UlSULtlL) 






BY SIMILARITY. 


FT 


DISULFID 


206 


215 


RY G.TMTI'.ARTTY 
Dl Oinibnl\ll X . 


FT 


DISULFID 


222 


233 


BY SIMILARITY. 


FT 


DISULFID 


227 


243 


BY SIMILARITY. 


FT 


DISULFID 


245 


254 


BY SIMILARITY. 


FT 


DISULFID 


261 


272 


BY SIMILARITY. 


FT 


DISULFID 


266 


281 


BY SIMILARITY. 


FT 


DISULFID 


283 


292 


BY SIMILARITY. 


FT 


DISULFID 


299 


312 


BY SIMILARITY. 


FT 


DISULFID 


306 


321 


BY SIMILARITY. 


FT 


DISULFID 


323 


332 


BY SIMILARITY. 
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FT 


DISULFID 


339 


350 


BY SIMILARITY 


FT 


DISOLFID 


344 


359 


BY SIMILARITY 


FT 


DISULFID 


361 


370 


BY SIMILARITY 


FT 


DISOLFID 


376 


387 


BY SIMILARITY 


FT 


DISULFID 


381 


398 


BY SIMILARITY 


FT 


DISULFID 


400 


409 


BY SIMILARITY 


FT 


DISULFID 


416 


429 


BY SIMILARITY 


FT 


DISULFID 


423 


438 


BY SIMILARITY 


FT 


DISULFID 


440 


449 


BY SIMILARITY 


FT 


DISULFID 


456 


467 


BY SIMILARITY 


FT 


DISULFID 


461 


476 


BY SIMILARITY 


FT 


DISULFID 


478 


487 


BY SIMILARITY 


FT 


DISULFID 


494 


505 


BY SIMILARITY 


FT 


DISULFID 


499 


514 


BY SIMILARITY 


FT 


DISULFID 


516 


525 


BY SIMILARITY 


FT 


DISULFID 


532 


543 


BY SIMILARITY 


FT 


DISULFID 


537 


552 


BY SIMILARITY 


i 


DISULFID 


554 


563 


BY SIMILARITY 


h 


DISULFID 


570 


580 


BY SIMILARITY 


FT 


DISULFID 


575 


589 


BY SIMILARITY 


FT 


DISULFID 


591 


600 


BY SIMILARITY 


FT 


DISULFID 


607 


618 


BY SIMILARITY 


FT 


DISULFID 


612 


627 


BY SIMILARITY 


FT 


DISULFID 


629 


638 


BY SIMILARITY 


FT 


DISULFID 


645 


655 


BY SIMILARITY 


FT 


DISULFID 


650 


664 


BY SIMILARITY 


FT 


DISULFID 


666 


675 


BY SIMILARITY 


FT 


DISULFID 


682 


693 


BY SIMILARITY 


FT 


DISULFID 


687 


702 


BY SIMILARITY 


FT 


DISULFID 


704 


713 


BY SIMILARITY 


FT 


DISULFID 


720 


730 


BY SIMILARITY 


FT 


DISULFID 


725 


739 


BY SIMILARITY 


FT 


DISULFID 


741 


750 


BY SIMILARITY 


FT 


DISULFID 


757 


768 


BY SIMILARITY 


FT 


DISULFID 


762 


777 


BY SIMILARITY 


FT 


DISULFID 


779 


788 


BY SIMILARITY 


FT 


DISULFID 


795 


806 


BY SIMILARITY 


FT 


DISULFID 


800 


815 


BY SIMILARITY 


FT 


DISULFID 


817 


826 


BY SIMILARITY 


FT 


DISULFID 


833 


844 


BY SIMILARITY 


FT 


DISULFID 


838 


855 


BY SIMILARITY 


FT 


DISULFID 


857 


866 


BY SIMILARITY 


FT 


DISULFID 


873 


884 


BY SIMILARITY 


FT 


DISULFID 


878 


893 


BY SIMILARITY 


FT 


DISULFID 


895 


904 


BY SIMILARITY 


| 


DISULFID 


911 


922 


BY SIMILARITY 


1 


DISULFID 


916 


931 


BY SIMILARITY 


m 


DISULFID 


933 


942 


BY SIMILARITY 


FT 


DISULFID 


987 


998 


BY SIMILARITY 


FT 


DISULFID 


992 


1007 


BY SIMILARITY 


FT 


DISULFID 


1009 


1018 


BY SIMILARITY 


FT 


DISULFID 


1025 


1036 


BY SIMILARITY 


FT 


DISULFID 


1030 


1045 


BY SIMILARITY 


FT 


DISULFID 


1047 


1056 


BY SIMILARITY 


FT 


DISULFID 


1063 


1074 


BY SIMILARITY 


FT 


DISULFID 


1068 


1083 


BY SIMILARITY 


FT 


DISULFID 


1085 


1094 


BY SIMILARITY 


FT 


DISULFID 


1101 


1122 


BY SIMILARITY 


FT 


DISULFID 


1116 


1131 


BY SIMILARITY 


FT 


DISULFID 


1133 


1142 


BY SIMILARITY 


FT 


DISULFID 


1149 


1160 


BY SIMILARITY 


FT 


DISULFID 


1154 


1169 


BY SIMILARITY 


FT 


DISULFID 


1171 


1180 


BY SIMILARITY 


FT 


DISULFID 


1187 


1198 


BY SIMILARITY 


FT 


DISULFID 


1192 


1207 


BY SIMILARITY 


FT 


DISULFID 


1209 


1218 


BY SIMILARITY 


FT 


DISULFID 


1225 


1244 


BY SIMILARITY 


FT 


DISULFID 


1238 


1253 


BY SIMILARITY 


FT 


DISULFID 


1255 


1264 


BY SIMILARITY 


FT 


DISULFID 


1271 


1284 


BY SIMILARITY 


FT 


DISULFID 


1276 


1293 


BY SIMILARITY 


FT 


DISULFID 


1295 


1304 


BY SIMILARITY 


FT 


DISULFID 


1311 


1322 


.BY SIMILARITY 



1316 


1334 


BY SIMILARITY. 


1336 


1345 


BY SIMILARITY, 


1352 


1363 


BY SIMILARITY. 


1357 


1372 


BY SIMILARITY , 


1374 


1383 


BY SIMILARITY. 


1391 


1403 


BY SIMILARITY. 



Note: remainder of annotations omitted. 

Query Match 24 .5%; Score 198; DB 1; Length 2531; 

Best Local Similarity 40.0%; Pred. No. 1.33e-27; 

Matches 32; Conservative 18; Mismatches 23; Indels 7; Gaps 4 

Db 947 NECATNPCQNGANCTDCVDSYTCTCPTGFNGIHCENNTPDCT-ESS-C--F— NGGTCV 999 

::| : |::|l I I l::lll II l|:|: II: I ::| I : ||: |: 
Qy 3 DDCVGHKCRHGAQCTOEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 1000 DGINSFTCLCPPGFTGSYCQ 1019 

II llllhl: I: 
Qy 63 WQQEPTCRCPPGFAGPRCE 82 



RESULT 10 

ID LI12.CAEEL STANDARD; PRT; 1429 AA. 

AC P14585; 

DT Ol-JAN-1990 (REL. 13, CREATED) 

DT Ol-JAN-1990 (REL, 13, LAST SEQUENCE UPDATE) 

DT Ol-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 

DE LIN- 12 PROTEIN PRECURSOR, 

GN LIN- 12 OR R107.8. 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENT EA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 88334747. 

RA YOCHEM J., WESTON K., GREENWALD I.; 

RT "The Caenorhabditis elegans lin-12 gene encodes a transmembrane 

RT protein with overall similarity to Drosophila Notch."; 

RL NATURE 335:547-550(1988), 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M., 

RA BONFIELD J., BURTON J., CONNELL M,, COPSEY T., COOPER J,, COULSON A., 

RA CRAXTON M. , DEAR S., DO Z., DURBIN R., FAVELLO A,, FRASER A., 

RA FULTON L., GARDNER A., GREEN P., HAWKINS T., HILLIER L., JIER M. , 

RA JOHNSTON L., JONES M. , KERSHAW J,, KIRSTEN J., LAISSTER N. , 

RA LATREILLE P., LIGHTNING J., LLOYD C, MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L,, ROOPRA A., SAUNDERS D., SHOWNKEEN R., 

RA SIMS M,, SMALDON N., SMITH A., SMITH M. , SONNHAMMER E, , STADEN R. , 

RA SULSTON J., THIERRY -MIEG J., THOMAS K,, VAUDIN M., VAUGHAN K., 

RA WATERSON R., WATSON A., WEINSTOCK L., WILKINSON-SPROAT J,, 

RA WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C, 

RT elegans,"; 

RL NATURE 368:32-38(1994). 

CC -!- FUNCTION: LIN-12 IS IS INVOLVED IN SEVERAL CELL FATES DECISIONS 
CC THAT REQOIRES CELL-CELL INTERACTIONS. IT IS POSSIBLE THAT LIN-12 
CC ENCODES A MEMBRANE -BOUND RECEPTOR FOR A SIGNAL THAT ENABLES 
CC EXPRESSION OF THE VENTRAL UTERINE PRECORSOR CELL FATE. 

CC -!- SOBCELLOLAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- SIMILARITY 



-!- SIMILARITY 
■!- SIMILARITY 
-!- SIMILARITY 



HIGH, TO C. ELEGANS GLP-1. 
CONTAINS 13 EGF-LIKE DOMAINS, 
CONTAINS 3 LIN/NOTCH REPEATS. 
CONTAINS 6 ANK REPEATS. 



CC 
CC 
CC 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 
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cc 


modified 


and this statement is not removed, Usage by and for commercial 


FT 


DISULFID 413 429 BY SIMILARITY. 


cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


FT 


DISULFID 431 440 BY SIMILARITY. 


cc 
cc 


or send an email to licensedisb-sib.ch). 


FT 
FT 


DISULFID 507 518 BY SIMILARITY. 
DISULFID 512 529 BY SIMILARITY. 










DR 


EMBL; M12069; G156358; -. 




FT 


DISULFID 531 540 BY SIMILARITY, 


DR 


EMBL; Z14092; E1348691; -. 




FT 


DISULFID 547 558 BY SIMILARITY, 


DR 


PIR; S06434; S06434. 




FT 


DISULFID 552 567 BY SIMILARITY. 


DR 


WORMPEP; R107.8; 


CE00274, 




FT 


DISULFID 569 578 BY SIMILARITY. 


DR 


PROSITE; 


PS00010; 


ASXJYDROXYL; 3. 


FT 


DISULFID 586 597 BY SIMILARITY, 


DR 


PROSITE; PS00022; 


EGFJ; 12. 




FT 


DISULFID 591 607 BY SIMILARITY. 


DR 


PROSITE; PS01186; EGFJ; 11. 




FT 


DISULFID 609 618 BY SIMILARITY. 


DR 


PROSITE; PS01187; 


EGF CA; 2. 




FT 


CARBOHYD 41 41 POTENTIAL. 


DR 


PFAM; PFO 


0008; EG 


F; 13. 




FT 


CARBOHYD 165 165 POTENTIAL. 


DR 


PFAM; PF00023; ank; 4. 




FT 


CARBOHYD 194 194 POTENTIAL. 


DR 


PFAM; PF00066; notch; 3, 




FT' 


CARBOHYD 378 378 POTENTIAL. 


DR 


HSSP; PQQ740; 1IXA, 




FT 


CARBOHYD 515 515 POTENTIAL. 


KW 


DIFFERENTIATION; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 


FT 


CARBOHYD 623 623 POTENTIAL, 


KW 


GLYCOPROTEIN; SIGNAL. 




FT 


CARBOHYD 751 751 POTENTIAL, 


FT 


SIGNAL 


1 


15 


POTENTIAL. 


FT 


CARBOHYD 754 754 POTENTIAL, 


FT 


CHAIN 


16 


1429 


LIN- 12 PROTEIN. 


FT 


CARBOHYD 900 900 POTENTIAL. 




DOMAIN 


16 


908 


EXTRACELLULAR (POTENTIAL). 


SO 


SEQUENCE 1429 AA; 157115 MW; CFD2CCA4 CRC32; 


1 


TRANSMEM 


. 909 


931 


POTENTIAL, 






¥ 


DOMAIN 


932 


1429 


CYTOPLASMIC (POTENTIAL), 


Query Match 24.2%; Score 195; DB 1; Length 1429; 


FT 


DOMAIN 


24 


618 


13 X EGF-TYPE REPEATS. 


Best Local Similarity 30.1%; Pred. No, 7,46e-27; 


FT 


DOMAIN 


631 


750 


3 X LIN/NOTCH REPEATS, 


Matches 28; Conservative 27; Mismatches 29; Indels 9; Gaps 4; 


FT 


DOMAIN 


1046 


1266 


6 X ANK MOTIF REPEATS. 






FT 


1JUMA1N 


20 


61 


EGF -LIKE 1, 


Db 


365 DKNECLSENMCLNNGTCVNLPGSFRCDCARGFGGKWCDEP LNM-CQDFHCENDG 417 

::l : : II: :: 1 |::||:| h 1 1 |::: hi : 


FT 


DOMAIN 


114 


150 


bbr LIKE l. 




FT 


DOMAIN 


152 


190 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL). 


oy 


1 NNDDCVG-HKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGA 59 


FT 


DUMA1N 


201 


246 


EGF -LIKE 4, 






FT 


DOMAIN 


250 


285 


EGF-LIKE b. 


Db 


418 TCMHTSDHSPVCQCKNGFIGKRCEKECPIGFGG 450 


FT 


DOMAIN 


287 


323 


EGF-LIKE 6. 




I: :: 1 hi II 1 Mil : 1 1 


FT 


DOMAIN 


323 


363 


EGF-LIKE 7, 


Qy 


60 QCI * WQQEPTCRCPPGFAGPRCEKLITVNFVG 91 


FT 


DOMAIN 


365 


402 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL), 






FT 


DOMAIN 




441 


EGF -LIKE 9. 






FT 


DOMAIN 


449 


492 


EGF-LIKE 10. 


RESULT 11 


FT 


DOMAIN 


503 


541 


EGF-LIKE 11, 


ID 


FBP3_STRPU STANDARD; PRT; 570 AA. 


FT 






III 


EGF "LIKE 12, 


AC 


P49013; 


FT 


DOMAIN 


582 


619 


CfT-T TV!? 1 1 


DT 


01-FEB-1996 (REL. 33, CREATED) 


FT 


REPEAT 


635 


669 


LIN/NOTCH 1, 


DT 


01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 


FT 




670 


710 


LIN/NOTCH 2, 


DT 


01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 


FT 


KfcrliAl 


inU 


750 


LIN/NOTCH 3. 


DE 


FIBROPELLIN C PRECURSOR (EPIDERMAL GROWTH FACTOR-RELATED PROTEIN 3) 


FT 




1046 


1078 


hVIV MAT TP 1 
ANR MU1 it I , 


DE 


(EGF III) (FIBROPELLIN III). 


FT 


REPEAT 


1079 


1119 


ANI\ HU11I L , 


GN 


EGF3. 


FT 


REPEAT 






ANK MflTTF \ 

AWIV Fl\Jl it J , 


OS 


STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN). 


FT 


REPEAT 


1153 


1188 


AHI\ WU1 it 1 , 


OC 


EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 


FT 


REPEAT 


1189 


1232 


ANK MOTIF 5, 


OC 


EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROTIDAE; 


FT 


REPEAT 


1233 


1266 


ANK MOTIF 6. 


OC 


STRONGYLOCENTROTUS. 


FT 


DISULFID 


24 


35 


BY SIMILARITY. 


RN 


[1] 


1 


DISULFID 


29 


49 


BY SIMILARITY. 


RP 


SEQUENCE FROM N.A. 


I 


DISULFID 


51 


60 


BY SIMILARITY. 


RC 


TISSUE"GASTRULA; 


^T 


DISULFID 


118 


129 


BY SIMILARITY. 


RX 


MEDLINE; 93273088. 


FT 


DISULFID 


123 


138 


BY SIMILARITY. 


RA 


BISGROVE B.W., RAFF R.A.; 


FT 


DISULFID 


140 


149 


BY SIMILARITY. 


RT 


"The SpEGF III gene encodes a member of the fibropellins: EGF repeat- 


FT 


DISULFID 


156 


169 


BY SIMILARITY. 


RT 


containing proteins that form the apical lamina of the sea urchin 
embryo,"; 


FT 


DISULFID 


163 


178 


BY SIMILARITY, 


RT 


FT 


DISULFID 


180 


189 


BY SIMILARITY. 


RL 


DEV. BIOL. 157:526-538(1993), 


FT 


DISULFID 


205 


227 


BY SIMILARITY. 


CC 


-!- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 


FT 


DISULFID 


221 


234 


BY SIMILARITY. 


CC 


MATRIX. 


FT 


DISULFID 


236 


245 


BY SIMILARITY. 


CC 


-!- SUBCELLULAR LOCATION: EXTRACELLULAR. 


FT 


DISULFID 


254 


264 


BY SIMILARITY. 


CC 


-1- DEVELOPMENTAL STAGE: LOW LEVELS IN UNFERTILIZED EGGS AND DURING 


FT 


DISULFID 


259 


273 


BY SIMILARITY, 


CC 


EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN LATE 


FT 


DISULFID 


275 


284 


BY SIMILARITY. 


CC 


MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS MAINTAINED 


FT 


DISULFID 


291 


302 


BY SIMILARITY. 


CC 


THROUGH SUBSEQUENT STAGES. 


FT 


DISULFID 


296 


311 


BY SIMILARITY. 


cc 


-1- EXPRESSED BOTH MATERNALLY AND ZYGOTICALLY. 


FT 


DISULFID 


313 


322 


BY SIMILARITY, 


cc 


■I- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 


FT 


DISULFID 


327 


339 


BY SIMILARITY. 


cc 


-!- SIMILARITY: CONTAINS 1 CUB DOMAIN. 


FT 


DISULFID 


334 


351 


BY SIMILARITY. 


cc 


-!- SIMILARITY: THE C- TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 


FT 


DISULFID 


353 


362 


BY SIMILARITY. 


cc 


TO AVIDIN/STREPTAVIDIN. 


FT 


DISULFID 


369 


381 


BY SIMILARITY, 


cc 






FT 


DISULFID 


375 


390 


BY SIMILARITY. 


cc 


This SWISS-PROT entry is copyright, It is produced through a collaboration 


FT 


DISULFID 


392 


401 


BY SIMILARITY, 


cc 


between the Swiss Institute of Bioinformatics and the EMBL outstation - 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 


cc 


the European Bioinformatics Institute. There are no restrictions on its 
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cc 


use by 


non-profit institutions as long as its content is in no way 


cc 


modified and this statement is not removed. Usage by and for commercial 


cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


cc 
cc 

DR 


or send an email to Hcense@isb-sib.ch) . 


EMBL; L07045; G310660; -. 




DR 


PROSITE; PS00010; ASXJYDROXYL; 8. 


DR 


PROSITE; 


PS00022; EGF_1; 8. 




DR 
DR 


PROSITE; PS00577; AVIDIN; 1. 
PROSITE; PS01180; CUB; 1. 




DR 


PROSITE; 


PS01186; EGFJ; 7. 




DR 


PROSITE; PS01187; EGF.CA; 6, 




DR 


PFAM; PFC 


0008; EGF; 8. 




DR 


PEAM; PF00431; CUB; 1. 




DR 


HSSP; P00740; 1IXA, 




KW 


BIOTIN; EGF-LIKE DOMAIN; REPEAT; SIGNAL; GLYCOPROTEIN, 


Fi 

L 


SIGNAL 


1 


17 


POTENTIAL, 




CHAIN 


18 


570 


FIBROPELLIN C. 


h 


DOMAIN 


18 


55 


EGF-LIRE 1. 


FT 


DOMAIN 


62 


175 


CUB. 


FT 


DOMAIN 


176 


212 


EGF-LIKE 2, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM- BINDING (POTENTIAL). 


FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7. 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


442 


570 


AVIDIN-LIKE. 


FT 


DISULFID 


23 


34 


BY SIMILARITY. 


FT 


DISULFID 


28 


43 


BY SIMILARITY. 


FT 


DISULFID 


45 


54 


BY SIMILARITY. 


FT 


DISULFID 


180 


191 


BY SIMILARITY. 


FT 


DISULFID 


185 


200 


BY SIMILARITY. 


FT 


DISULFID 


202 


211 


BY SIMILARITY. 


FT 


DISULFID 
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SEQUENCE 
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61116 MW 


265BC4BB CRC32; 



Query Match 23.91; Score 193; DB 1; Length 570; 

Best Local Similarity 41.2%; Pred. No. 2.35e-26; 

Matches 35; Conservative 12; Mismatches 31; Indels 7; 



5; 



Db 176 DGDDCTPNPCLNGATCVDQVNDYQCICAPGFTGDNCE-TD-I-D-E-CASAPCRNGGA 228 

: III : I HI llhll I III: 1 1 : 1 II : : I 1 : 1 1 : 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 229 CVDQVNGYTCNCIPGFNGVNCENN1 253 

I: II I III I II: I 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 12 

ID NOTCJRARE STANDARD; PRT; 2437 AA, 
AC P46530; 



DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN PRECURSOR, 

GN NOTCH. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI ; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM' N, A, 

RC TISSUE-EMBRYO; 

RX MEDLINE; 94128602, 

RA BIERKAMP C . , CAMPOS -ORTEGA J. A.; 

RT "A zebrafish homologue of the Drosophila neurogenic gene Notch and 

RT its pattern of transcription during early embryogenesis."; 

RL MECH. DEV. 43:87-100(1993), 

CC •!■ FUNCTION: IMPLICATED IN CELL FATE SPECIFICATIONS DURING 
CC EMBRYO DEVELOPMENT. MAY BE INVOLVED IN THE FORMATION OF THE 
CC NEURAL PLATE, NOTOCHORD AND BRAIN VESICLES, 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- DEVELOPMENTAL STAGE: EXPRESSED IN ALL CELLS IN PREG ASTRULAT ION 
CC STAGES, DURING GASTRULATION IS DIFFERENTIALLY EXPRESSED, 
CC ACCUMULATING PREDOMINANTLY IN THE PRECHORDAL MESODERM AND 
CC NOTOCHORD. AT THE END OF GASTRULATION, EXPRESSED ALONG THE 
CC ANTERIOR- POSTERIOR AXIS INCLUDING THE DEVELOPING NEURAL PLATE 
CC AND DIFFERENTIATING MESODERM. ALSO PRESENT IN THE DEVELOPING 
CC BRAIN AND HEAD REGIONS. 

CC -!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC -I- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC •!- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC r 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib,ch). 

CC 

DR EMBL; X69088; G433867; -. 

DR PROSITE; PS00010; ASX HYDROXYL; 23. 

DR PROSITE; PS00022; EGFJ; 34. 

DR PROSITE; PS01186; EGFJ; 28. 

DR PROSITE; PS01187; EGF_CA; 22. 

DR PFAM; PF00008; EGF; 36, 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P0O74O; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 
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LIN/NOTCH 2. 

LIN/NOTCH 3. 

6 X ANK MOTIF REPEATS. 

ANK MOTIF 1. 
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Note: remainder of annotations omitted. 

Query Match 23 .8%; Score 192; DB 1; Length 2437; 

Best Local Similarity 35.8*; Pred. No. 4.16e-26; 

Matches 29; Conservative 16; Mismatches 35; Indels 1; Gaps 1; 

Db 1183 NECLSQPCQNGGTCIDLVNTYKCSCPRGTQGVHCEIDIDDCSPSVDPLTGEPRCFNGGRC 1242 

::|::: |::|: hi II I I lh! h II I : I ||::| 

Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSP-CDQYECQNGAQC 61 

Db 1243 VDRVGGYGCVCPAGFVGERCE 1263 

: I 1:1 I III 

Qy 62 IWQQEPTCRCPPGFAGPRCE 82 



PRT; 1408 AA. 



RESULT 13 

ID SERR.DROME STANDARD; 
AC P18168; 
DT 01-NOV-1990 (REL. 16, CREATED) 
DT 01-JUL-1993 (REL. 26, LAST SEQUENCE UPDATE) 
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647-13. rsp 
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DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE SERRATE PROTEIN PRECURSOR (BEADED PROTEIN). 

GN SER OR BD. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-OREGON-R; 

RX MEDLINE; 91347903. 

RA THOMAS U., SPEICHER S.A., KNUST E.; 

RT "The Drosophila gene Serrate encodes an EGF-like transmembrane 

RT protein with a complex expression pattern in embryos and wing 

RT discs."; 

RL DEVELOPMENT 111:749-761(1991). 

(» [2] 

P SEQUENCE FROM N.A, 

X MEDLINE; 91099666, 

A FLEMING R.J., SCOTTGALE T.H., DIEDERICH R.J., ARTAVANIS-TSAKONAS S.; 

RT "The gene Serrate encodes a putative EGF-like transmembrane protein 

RT essential for proper ectodermal development in Drosophila 

RT melanogaster , "; 

RL GENES DEV. 4:2188-2201(1990). 

CC •!• FUNCTION: ESSENTIAL FOR PROPER ECTODERMAL DEVELOPMENT. SERRATE 
CC MAY REPRESENT AN ELEMENT IN A NETWORK OF INTERACTING MOLECULES 
CC OPERATING AT THE CELL SURFACE DURING THE DIFFERENTIATION OF 
CC CERTAIN TISSUES. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC •!• TISSUE SPECIFICITY: APPEARS TO BE RESTRICTED EXCLUSIVELY TO 

CC CELLS OF ECTODERMAL ORIGIN. 

CC *!• SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 

CC OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 

CC THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES . 

CC -!- NOTCH AND SERRATE MAY INTERACT AT THE PROTEIN LEVEL, IT IS 

CC CONCEIVABLE THAT THE SERRATE AND DELTA PROTEINS MAY COMPETE 

CC FOR BINDING WITH THE NOTCH PROTEIN, 

CC -!- SIMILARITY; CONTAINS 14 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: TO THE DROSOPHILA NEUROGENIC LOCUS DELTA PROTEIN. 

cc 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is • in no way 

CC modified and this statement is not removed. Usage by and for commercial 

•entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch), 



DR EMBL; X56811; G8564; -. 

DR EMBL; M35759; G158606; -. 

DR PIR; A36666; A36666. 

DR PIR; S16878; S16878. 

DR FLYBASE; FBgn0004197; Ser. 

DR PROSITE; PS00010; ASXJYDROXYL; 7. 

DR PROSITE; PS00022; EGF 1; 14, 

DR PROSITE; PS01186; EGF 2; 8. 

DR PROSITE; PS01187; EGF_CA; 5. 

DR PFAM; PF00008; EGF; 11. 

DR HSSP; P00743; 1WHE. 

KW DIFFERENTIATION; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN; SIGNAL. 
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CTMTT SDTTV 


FT 


DISULFID 


479 


488 


BY 


CTMTT ABTTV 


FT 


DISULFID 


495 


506 




CTMTT aDTTV 


FT 


DISULFID 


500 


515 


BY 


CTMTT.ABTTV 
OiMiionKil I . 


FT 


DISULFID 


517 


526 




CTMTT KDTfTV 


FT 


DISULFID 


533 


588 


nv 


CTMTT SDTTV 


FT 


DISULFID 


582 


597 


RV 


CTMTT 1BTTV 


FT 


DISULFID 


599 


608 


BY 


CTMTT ADTTV 
OiniLnnilX . 


FT 


DISULFID 


615 


625 




C.TMTT.APTTV 
oiniLAKill . 


FT 


DISULFID 


619 


634 


BY 


CTVTT.ARTTY 


FT 


DISULFID 


636 


645 


BY 


CTMTr.ARTTV 




DISULFID 


652 


663 


BY 


SIMILARITY. 


FT 


DISULFID 


657 


672 


BY 


SIMILARITY. 


FT 


DISULFID 


674 


683 


BY 


SIMILARITY. 


FT 


DISULFID 


690 


700 


BY 


SIMILARITY. 


FT 


DISULFID 


695 


709 


BY 


SIMILARITY. 


FT 


DISULFID 


711 


720 


BY 


SIMILARITY. 


FT 


DISULFID 


803 


814 


BY 


SIMILARITY. 


FT 


DISULFID 


808 


823 


BY 


SIMILARITY. 


FT 


DISULFID 


825 


834 


BY 


SIMILARITY. 


FT 


DISULFID 


841 


852 


BY 


SIMILARITY. 


FT 


DISULFID 


846 


865 


BY 


SIMILARITY. 


FT 


DISULFID 


867 


876 


BY 


SIMILARITY. 


FT 


DISULFID 


883 


894 


BY 


SIMILARITY. 


FT 


DISULFID 


888 


903 


BY 


SIMILARITY. 


FT 


DISULFID 


905 


914 


BY 


SIMILARITY. 


FT 


DISULFID 


921 


932 


BY 


SIMILARITY. 


FT 


DISULFID 


926 


941 


BY 


SIMILARITY. 


FT 


DISULFID 


943 


952 


BY 


SIMILARITY. 


FT 


CARBOHYD 


152 


152 


POTENTIAL, 


FT 


CARBOHYD 


196 


196 


POTENTIAL. 


FT 


CARBOHYD 


247 


247 


POTENTIAL. 


FT 


CARBOHYD 


331 


331 


POTENTIAL, 


FT 


CARBOHYD 


412 


412 


POTENTIAL. 


FT 


CARBOHYD 


452 


452 


POTENTIAL, 


FT 


CARBOHYD 


558 


558 


POTENTIAL. 


FT 


CARBOHYD 


739 


739 


POTENTIAL. 


FT 


CARBOHYD 


965 


965 


POTENTIAL. 


FT 


CARBOHYD 


977 


977 


POTENTIAL. 


FT 


CARBOHYD 


1004 


1004 


POTENTIAL. 


FT 


CARBOHYD 


1030 


1030 


POTENTIAL. 


FT 


CARBOHYD 


1150 


1150 


POTENTIAL. 


FT 


CONFLICT 


14 


17 


MISSING (IN REF. 2). 


FT 


CONFLICT 


27 


27 


P 


> A (IN REF. 2). 


FT 


CONFLICT 


1352 


1352 


T 


> S (IN REF. 2), 


SQ 


SEQUENCE 


1408 AA; 150660 


MW; 


A494A358 CRC32; 



Query Match 23.5%; Score 190; DB 1; Length 1408; 

Best Local Similarity 41.31; Pred, No, 1.30e-25; 

Matches 33; Conservative 15; Mismatches 24; Indels 8; Gaps 

Db 613 DDCVGQ-CRNGATCIDLVNDYRCACASGFTGRDCE-TD-I--D-E-CATSPCRNGGECV 664 

lllll: 11:11 hi II II I: ll:| II : : I |:||::|: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 
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665 DMVGKFNCICPLGYSGSLCE 684 

: I II |::|: II 
63 WQQEPTCRC PPG FAG PRC E 82 



RESULT 14 

ID DLL1_M0USE STANDARD; PRT; 722 AA. 
AC Q61483; 

DT 01-NOV-1997 (REL. 35, CREATED) 
DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 
DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 
DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTAl) , 
GN DM. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 
RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BALB/C X C57BL/6; TISSUE-EMBRYO; 

•MEDLINE; 95401858. 
BETTENHAUSEN B. , DE ANGELIS M.H., SIMON D., GUENET J.-L., GOSSLER A,; 
"Transient and restricted expression during mouse embryogenesis of 
Dill, a murine gene closely related to Drosophila Delta."; 
DEVELOPMENT 121:2407-2418(1995). 

-!- FUNCTION; MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 
MAMMALIAN EMBRYOS . MAY HAVE A ROLE IN CELLULAR INTERACTIONS 
UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM. 
-!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
-!- TISSUE SPECIFICITY: IN THE EMBRYO, EXPRESSED IN THE PARAXIAL 
MESODERM AND NERVOUS SYSTEM, EXPRESSED AT HIGH LEVELS IN ADULT 
HEART AND AT LOWER LEVELS, IN ADULT LUNG. 
-!- DEVELOPMENTAL STAGE: EXPRESSED UNTIL DAY 15 IN THE EMBRYO. 

EXPRESSION THEN DECREASES AND INCREASES AGAIN IN THE ADULT. 
-!- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS, 
-!- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 



This SWISS-PROT entry is copyright, It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license?isb-sib.ch). 

EMBL; X80903; G80657Q; -. 
MGD; MGI: 104659; DLL1, 
PR0SITE; PS00010; ASXJYDROXYL; 3. 
PROSITE; PS00022; EGF 1; 8. 
PROSITE; PS01186; EGFJ; 8. 
PROSITE; PS01187; EGF.CA; 2. 
PFAM; PF00008; EGF; 6, 
HSSP; P00740; 1IXA, 

SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE, 



FT 


SIGNAL 


1 


17 


POTENTIAL. 


FT 


CHAIN 


18 


722 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


546 


568 


POTENTIAL. 


FT 


DOMAIN 


569 


722 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


225 


253 


EGF-LIKE 1. 


FT 


DOMAIN 


256 


284 


EGF-LIKE 2. 


FT 


DOMAIN 


291 


324 


EGF-LIKE 3, 


FT 


DOMAIN 


331 


362 


EGF-LIKE 4, CALCIUM BINDING 


FT 


DOMAIN 


369 


401 


EGF-LIKE 5, 


FT 


DOMAIN 


408 


439 


EGF-LIKE 6. 


FT 


DOMAIN 


446 


477 


EGF-LIKE 7, CALCIUM-BINDING 


FT 


DOMAIN 


484 


515 


EGF-LIKE 8. 


FT 


DISULFID 


225 


236 


BY SIMILARITY. 


FT 


DISULFID 


229 


242 


BY SIMILARITY. 


FT 


DISULFID 


244 


253 


BY SIMILARITY. 


FT 


DISULFID 


256 


267 


BY SIMILARITY, 


FT 


DISULFID 


262 


273 


BY SIMILARITY. 


FT 


DISULFID 


275 


284 


BY SIMILARITY, 



FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT CARB0HYD 



291 
297 
315 
331 
336 
353 
369 
374 
392 
408 
413 
430 
446 
468 
484 
489 
506 
476 

722 AA; 



303 
313 
324 
342 
351 
362 
380 
390 
401 
419 
428 
439 
466 
477 
495 
504 
515 
476 
78448 



BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
POTENTIAL. 
MW; 5A647702 CRC32; 



Query Match 23.3%; Score 188; DB 1; Length 722; 

Best Local Similarity 38.8%; Pred. No. 4,08e-25; 

Matches 33; Conservative 17; Mismatches 28; Indels 7; Gaps ] 

Db 442 NVDDCASSPCANGGTCRDSVNDFSCTCPPGYTGKNC--SAP-V----SRCEHAPCHNGAT 494 

I III : I :|: I I II ::| II l::l I ::| I I |:: hi' 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 495 CHQRGQRYMCECAQGYGGPNCQFLL 519 

I III: |::M I: |: 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 
ID 
AC 
DT 
DT 
DT 



15 



STANDARD; 



PRT; 723 AA. 



DLL1 HUMAN 
000548; 

15-JUL-1998 (REL. 36, CREATED) 
15-JUL-1998 (REL. 36, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE DELTA-LIKE PROTEIN 1 PRECURSOR (DELTAl) , 

GN DLL1. 

OS HOMO SAPIENS (HUMAN) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC PRIMATES; CATARRH INI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A, 

RA MANN R.S., GRAY G.E., HENRIQUE D., ISH-HOROWICZ D., 

RA ARTAVANIS-TSAKONAS S.; 

RL SUBMITTED (MAY-1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

CC -!- FUNCTION: MAY BE INVOLVED IN CELL-TO-CELL COMMUNICATION IN 

CC MAMMALIAN EMBRYOS. MAY HAVE A ROLE IN CELLULAR INTERACTIONS 

CC UNDERLYING SOMITOGENESIS AND DEVELOPMENT OF THE NERVOUS SYSTEM (BY 

CC SIMILARITY). 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: TO DROSOPHILA DELTA PROTEIN. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch). 

CC 

DR EMBL; AF003522; G2197069; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF_1; 8. 

DR PROSITE; PS01186; EGF J!; 8, 

DR PROSITE; PS01187; EGF.CA; 1. 

DR PFAM; PF00008; EGF; 6, 

DR HSSP; P00740; 1IXA. 

KW SIGNAL; EGF-LIKE DOMAIN; GLYCOPROTEIN; TRANSMEMBRANE. 
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FT 


SIGNAL 


1 


17 


POTENTIAL, 


FT 


CHAIN 


18 


723 


DELTA-LIKE PROTEIN 1. 


FT 


DOMAIN 


18 


545 


EXTRACELLULAR (POTENTIAL) , 


FT 


TRANSMEM 


546 


568 


POTENTIAL, 


FT 


DOMAIN 


569 


723 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


226 


254 


EGF-LIKE 1. 


FT 


DOMAIN 


257 


285 


EGF-LIKE 2. 


FT 


DOMAIN 


292 


325 


EGF-LIKE 3. 


FT 


DOMAIN 


332 


363 


EGF-LIKE 4, CALCIUM BINDING (POTENTIAL) . 


FT 


DOMAIN 


370 


402 


EGF-LIKE 5, 


FT 


DOMAIN 


409 


440 


EGF-LIKE 6. 


FT 


DOMAIN 


447 


478 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


485 


516 


EGF-LIKE 8, 


FT 


DISULFID 


226 


237 


BY SIMILARITY. 


FT 


DISULFID 


230 


243 


BY SIMILARITY. 


FT 


DISULFID 


245 


254 


BY SIMILARITY. 


FT 


DISULFID 


257 


268 


BY SIMILARITY. 




DISULFID 


263 


274 


BY SIMILARITY. 


ft 


DISULFID 


276 


285 


BY SIMILARITY. 


FT 


DISULFID 


292 


304 


BY SIMILARITY. 


FT 


DISULFID 


298 


314 


BY SIMILARITY. 


FT 


DISULFID 


316 


325 


BY SIMILARITY. 


FT 


DISULFID 


332 


343 


BY SIMILARITY. 


FT 


DISULFID 


337 


352 


BY SIMILARITY, 


FT 


DISULFID 


354 


363 


BY SIMILARITY, 


FT 


DISULFID 


370 


381 


BY SIMILARITY, 


FT 


DISULFID 


375 


391 


BY SIMILARITY, 


FT 


DISULFID 


393 


402 


BY SIMILARITY, 


FT 


DISULFID 


409 


420 


BY SIMILARITY, 


FT 


DISULFID 


414 


429 


BY SIMILARITY, 


FT 


DISULFID 


431 


440 


BY SIMILARITY, 


FT 


DISULFID 


447 


467 


BY SIMILARITY, 


FT 


DISULFID 


469 


478 


BY SIMILARITY. 


FT 


DISULFID 


485 


496 


BY SIMILARITY. 


FT 


DISULFID 


490 


505 


BY SIMILARITY. 


FT 


DISULFID 


507 


516 


BY SIMILARITY. 


FT 


CARBOHYD 


477 


477 


POTENTIAL. 


SQ 


SEQUENCE 


723 AA; 


77956 MW 


A1D48BDB CRC32; 



Query Match 23.3*; Score 188; DB 1; Length 723; 

Best Local Similarity 37.6%; Pred. No. 4,08e-25; 

Matches 32; Conservative 18; Mismatches 28; Indels 7; Gaps 3; 

Db 443 NVDDCASSPCANGGTCRDGVNDFSCTCPPGYTGRNC-SAP-V — SRCEHAPCHNGAT 495 
. I III : I :|: I I II ::| II |::| I ::| I I |:: Ml 

m 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Tb 496 CHERGHGYVCECARGYGGPNCQFLL 520 

I : I h |::|| I: I: 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKLI 85 



Search completed; Fri May 28 09:36:23 1999 
Job time : 26 sees, 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 

•Distribution rights by Oxford Molecular Ltd 
rch_pp protein - protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:36:42 1999; MasPar time 9.96 Seconds 

569.941 Million cell updates/sec 

Tabular output not generated. 

Title: >US-09-191-647-13 

Description: (1-104) from OS09191647 . pep 

Perfect Score: 807 

Sequence: 1 NNDDCVGHKCRHGAQCVDEV ITVNFVGKDSYVELASAKVR 104 



Scoring table: 



PAM 150 
Gap 11 



179066 seqs, 54579741 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 

Database; sptrembl9 

l:sp_archea 2:sp_bacteria 3:sp_fungi 4:sp_human 
5:sp_invertebrate 6:sp_mammal 7:spjnhc 8:sp_organelle 
9:sp_phage 10:sp_plant ll:sp_rodent 12:sp_unclassified 
13:sp_vertebrate 14 :sp_virus 

Statistics: Mean 36.193; Variance 55.571; scale 0.651 

Pred. No. is the number of results predicted by chance to have a 
B- score greater than or equal to the score of the result being printed, 
W and is derived by analysis of the total score distribution, 



1 

Query 



SUMMARIES 



NO. 


Score 


Match Length 


DB 


ID 


Description 


Pred, No. 


1 


793 


98.3 


1523 


11 


088280 


MEGF5. 


2. lie 


180 


2 


770 


95.4 


739 


4 


075094 


MEGF5 (FRAGMENT). 


3.98e 


174 


3 


394 


48.8 


1531 


11 


088279 


MEGF4. 


2.54e 


74 


4 


343 


42.5 


530 


5 


Q24526 


SLIT LOCOS ENCODING A 


3.00e 


61 


5 


307 


38.0 


601 


5 


Q20204 


F40E10.4 PROTEIN (FRAG 


3.79e 


52 


6 


236 


29.2 


2531 


5 


016004 


NOTCH H0M0L0G. 


1.26e 


34 


7 


227 


28.1 


529 


5 


Q25058 


FIBROPELLIN IA (FRAGME 


1.85e 


32 


8 


212 


26.3 


1212 


13 


042347 


C- SERRATE -2 (FRAGMENT) 


6.99e 


29 


9 


209 


25.9 


1193 


13 


Q90819 


C-SERATE-1 PROTEIN (FR 


3.59e 


28 


10 


208 


25.8 


752 


13 


042374 


NOTCH RECEPTOR PROTEIN 


6.18e 


28 


11 


205 


25.4 


406 


5 


Q25059 


FIBROPELLIN III (FRAGM 


3.15e 


27 


12 


203 


25.2 


1722 


5 


019350 


SIMILAR TO EGF-LIKE RE 


9.30e 


27 


13 


199 


24.7 


717 


13 


P87357 


DELTAD TRANSMEMBRANE P 


8.07e 


26 


14 


199 


24.7 


802 


13 


057462 


DELTAA. 


8.07e 


26 


15 


197 


24.4 


387 


11 


Q06007 


NOTCH PROTEIN HOMOLOG 


2.37e 


25 


16 


197 


24.4 


2447 


13 


013149 


NOTCH 2 (FRAGMENT). 


2.37e 


25 


17 


196 


24.3 


1203 


11 


Q06008 


NOTCH PROTEIN HOMOLOG 


4.05e 


25 


18 


196 


24.3 


1574 


11 


088281 


MEGF6, 


4.05e 


25 


19 


196 


24.3 


2470 


11 


035516 


CELL SURFACE PROTEIN. 


4,05e 


25 


20 


193 


23.9 


1476 


13 


Q90285 


PUTATIVE EXTRACELLULAR 


2.03e 


24 



21 


191 


23.7 


2352 5 


061240 


HRNOTCH PROTEIN. 


5,90e-24 


22 


190 


23.5 


1372 5 


P91526 


SIMILARITY TO MULTIPLE 


L.01e-23 


23 


190 


23.5 


2653 5 


025253 


NOTCH HOMOLOG SCALLOPE 


L.01e-23 


24 


188 


23.3 


955 4 


Q99466 


NOTCH4 (FRAGMENT), 


2.92e-23 


25 


188 


23.3 


1218 4 


Q15816 


TRANSMEMBRANE PROTEIN 


2.92e-23 


26 


188 


23,3 


1218 4 


014902 


TRANSMEMBRANE PROTEIN 


2.92e-23 


27 


188 


23.3 


1219 11 


Q63722 


JAGGED PROTEIN, 


2.92e-23 


28 


188 


23,3 


1227 4 


P78504 


JAGGED 1 (TRANSMEMBRAN 


2.92e-23 


29 


188 


23.3 


1999 4 


Q99940 


NOTCH4 , 


2.92e-23 


30 


188 


23.3 


2003 4 


000306 


NOTCH4 . 


2.92e-23 


31 


186 


23.0 


1218 4 


015122 


JAGGEDl . 


8.47e-23 


32 


185 


22.9 


1964 11 


035442 


NOTCH4 . 


1.44e-22 


33 


182 


22.6 


156 5 


026661 


EPIDERMAL GROWTH FACTO 


7.05e-22 


34 


180 


22.3 


263 4 


099734 


NOTCH2 TRANSMEMBRANE P 


2.03e-21 


35 


178 


22.1 


434 11 


055139 


JAGGED2 PROTEIN (FRAGM 


5,81e-21 


36 


178 


22.1 


518 11 


070219 


JAGGED 2 (JAGGED 2 PRO 


5.81e-21 


37 


178 


22.1 


615 13 


057409 


DELTAB. 


5.81e-21 


38 


178 


22.1 


728 13 


Q90656 


TRANSMEMBRANE PROTEIN 


5.81e-21 


39 


178 


22.1 


762 13 


042373 


NOTCH RECEPTOR PROTEIN 


5.81e-21 


40 


178 


22.1 


1202 11 


P97607 


JAGGED2 (FRAGMENT). 


5.81e-21 


■41 


176 


21.8 


721 13 


Q91902 


X-DELTA-1, 


1.66e-20 


42 


175 


21,7 


153 4 


075095 


MEGF6 (FRAGMENT). 


2.80e-20 


43 


174 


21,6 


2824 5 


P90891 


F55H12.3 PROTEIN, 


4.73e-20 


44 


172 


21,3 


259 5 


Q93519 


F16B12.2 PROTEIN, 


1.34e-19 


45 


171 


21.2 


832 5 


Q99108 


NEUROGENIC LOCUS DELTA 


2.26e-19 



RESULT 
ID 
AC 



1 



PRELIMINARY; PRT; 1523 AA. 



, CREATED) 

, LAST SEQUENCE UPDATE) 
, LAST ANNOTATION UPDATE) 



088280 
088280; 

01-NOV-1998 (TREMBLREL. ( 
01-NOV-1998 (TREMBLREL. ( 
01-NOV-1998 (TREMBLREL. ( 
MEGF5. 
MEGF5, 

RATTUS NORVEGICUS (RAT). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS, 

[1] 

SEQUENCE FROM N.A, 

STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
MEDLINE; 98360089, 

NAKAYAMA M, , NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., 0HARA 0.; 

"Identification of high-molecular-weight proteins with multiple 

EGF-like motifs by motif -trap screening."; 

GENOMICS 51:27-34(1998). 

EMBL; AB011531; D1033424; -. 

PROSITE; PS01185; CTCK.1; 1. 

PR0SITE; PS01186; EGF 2; 7. 

PROSITE; PS01187; EGF_CA; 2. 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1523 AA; 167767 MW; 2BD845D0 CRC32; 



Query Match 98.3%; Score 793; DB 11; Length 1523; 

Best Local Similarity 97.1%; Pred. No. 2.41e-180; 

Matches 101; Conservative 2; Mismatches 1; Indels 0; Gaps 0; 

Db 1074 DNDDCVAHKCRHGAQCVDAVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 1133 

:MI!l:|lllilMII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIMI 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 1134 CIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 1177 

IIIIIIIIIIIIIIIIIIIIIIIIIIMIIMIimilllll! 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 104 



RESULT 2 

ID 075094 PRELIMINARY; 
AC 075094; 

DT 01-NOV-1998 (TREMBLREL. ( 
DT 01-NOV-1998 ( 



, LAST SEQUENCE UPDATE) 
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DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
DE MEGF5 (FRAGMENT). 
GN MEGF5. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EOTHERIA; PRIMATES; 
OC CATARRHINI; HOMINIDAE; HOMO. 
RN [1] 

RP SEQUENCE FROM N.A. 
RC TISSUE-BRAIN; 
RX MEDLINE; 98360089. 

RA NAKAYAMA M., NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA O.; 

RT "Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif -trap screening."; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011538; D1033429; -. 

DR PROSITE; PS01185; CTCK 1; 1. 

DR PROSITE; PS01186; EGFJ; 7. 

DR PROSITE; PS0U87; EGF_CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

* SEQUENCE 739 AA; 80364 MW; DC6BCB63 CRC32; 
ery Match 95.44; Score 770; DB 4; Length 739; 

„st Local Similarity 94.2%; Pred, No. 3,98e-174; 
Matches 98; Conservative 3; Mismatches 3; Indels 0; Gaps 0; 

Db 290 DNDDCVAHKCRHGAQCVDTINGYTCTCPQGFSGPFCEHPPPMVLLQTSPCDQYECQNGAQ 349 

MM lllllll IIIMIIIIIIIIIIMIIIIMIII 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 350 CIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 393 

M 1 1 M 1 1 1 1 1 1! I ! 1 1 1 1 1 II 1 1 1 1 M 1 1 ( I M 1 1 1 1 1 1 1 1 1 1 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 104 



RESULT 3 

ID 088279 PRELIMINARY; PRT; 1531 AA, 

AC 088279; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF4 . 

GN MEGF4. 

OS RATTOS NORVEGICUS (RAT), 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 

•MEDLINE; 98360089. 

. NAKAYAMA M, , NAKAJIMA D., NAGASE T., NOMURA N., SEKI N., OHARA O.; 
"Identification of high-molecular -weight proteins with multiple 

RT EGF-like motifs by motif-trap screening,"; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB0U530; D1033423; -. 

DR PROSITE; PS01185; CTCK_1; 1, 

DR PROSITE; PS01186; EGFJ; 8. 

DR PROSITE; PS01187; EGF.CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

SQ SEQUENCE 1531 AA; 167497 MW; 5C5EBDF4 CRC32; 

Query Match 48.8*; Score 394; DB 11; Length 1531; 

Best Local Similarity 51.54; Pred, No. 2.54e-74; 

Matches 51; Conservative 24; Mismatches 20; Indels 4; Gaps 3; 

Db 1083 NQDDCKDHQCQNGAQCVDEINSYACLCAEGYSGQLCEIPPA-P-'-RNS-CEGTECQNGAN 1138 

I III I I::|||||||:|:|:|:|::|:|| :|| I: : | : III' 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 1139 CVDQGSRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFT 1177 

I: I 1:1 111:11 llll::|||| :|:|:::: 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELA 99 



RESULT 4 

ID Q24526 PRELIMINARY; PRT; 530 AA. 

AC Q24526; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE SLIT LOCUS ENCODING A PROTEIN ASSOCIATED WITH NEURAL DEVELOPMENT WITH 

DE 52D EGF HOMOLOGOUS DOMAINS (FRAGMENT). 

OS DROSOPHILA MEIANOGASTER (FRUIT FLY) . 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC STRAIN-CANTON S; 

RX MEDLINE; 89077533. 

RA ROTHBERG J.M,, HARTLEY D.A, , WALTHER Z., ART AVANI S - T SAKONAS S.; 

RT "slit: an EGF -homologous locus of D. melanogaster involved in the 

RT development of the embryonic central nervous system."; 

RL CELL 55:1047-1059(1988), 

DR EMBL; M23543; G514357; ■. 

DR FLYBASE; FBgn0003425; sli. 

DR PROSITE; PS01186; EGFJ; 5. 

DR PROSITE; PS01187; EGF CA; 2. 

DR PFAM; PF00008; EGF; 7, 

DR' PFAM; PF00054; laminin_G; 1. 

KW NEUROGENESIS; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 530 530 

SQ SEQUENCE 530 AA; 59457 MW; 10E5764D CRC32; 

Query Match 42.54; Score 343; DB 5; Length 530; 

Best Local Similarity 42.24; Pred. No. 3 .00e-61; 

Matches 46; Conservative 26; Mismatches 31; Indels 6; Gaps 5; 

Db 184 NIDDCQNHMCQNGGTCVDGINDYQCRCPDDYTGKYCEGHNMISMMYPQTSPCQNHECKHG 243 

I III hl::|: III :| I I II: ::| :|l I :|: Mill: II I 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCE-HPP-PMVLLQTSPCDQYECQNG 58 

Db 244 V-CFQPNAQGSDYLCRCHPGYTGKWCEYLTSISFVHNNSFVELEPLRTR 291 

1: I : III II:: :|| I :::|| ::|:||| : : I 
Qy 59 AQCIWQ-QEPT--CRCPPGFAGPRCEKLITVNFVGKDSYVELASAKVR 104 



RESULT 5 

ID Q20204 PRELIMINARY; PRT; 601 AA. 

AC Q20204; 

DT 01-NOV-1996 (TREMBLREL, 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-AUG-1998 (TREMBLREL, 07, LAST ANNOTATION UPDATE) 

DE F40E10. 4 PROTEIN (FRAGMENT), 

GN F40E10.4. 

OS CAENORHABDITIS ELEGANS, 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N,A, 

RA SMYER.; 

RL SUBMITTED (FEB-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [2] 

RP SEQUENCE FROM N,A. 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M., 

RA BONFIELD J,, BURTON J., CONNELL M., COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M., DEAR S., DU Z., DDRBIN R., FAVELLO A., FULTON L., 

RA GARDNER A., GREEN P., HAWKINS T., HILLIER L,, JIER M,, JOHNSTON L. , 

RA JONES M,, KERSHAW J,, KIRSTEN J., LAISTER N,, LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A,, MORTIMORE B, , O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R,, 

RA SMALDON N., SMITH A., SONNHAMMER E., STADEN R. , SULSTON J., 

RA THIERRY "MIEG J., THOMAS K., VAUDIN M. , VAUGHAN K. , WATERSTON R. , 

RA WATSON A., WEINSTOCK L,, WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 
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RT elegans . " ; 

RL NATURE 368:32-38(1994), 

DR EMBL; Z69792; E1346469; -. 

DR PROSITE; PS01187; EGF_CA; 1. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

FT NON_TER 1 1 

SQ SEQUENCE 601 AA; 66669 MW; F8A72773 CRC32; 

Query Match 38.0%; Score 307; DB 5; Length 601; 

Best Local Similarity 37.5%; Pred. No, 3,79e-52; 

Matches 39; Conservative 24; Mismatches 36; Indels 5; Gaps 4; 

Db 216 NIDDCKNVECONGGSCVDGILSYDCLCRPGYAGQYCEIPPMMDMEyQKTDACQQSACGQG 275 

I III I::|: III : :l hi l::| :ll II I : II :|:| I I 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVL-LQ-TSPCDQYECQNG 58 

f276 -ECVASQNSSDFTCKCHEGFSGPSCDRQMSVGFKNPGAYLALDP 318 
:|: I : IN Ihll I" "I I :|: I : 
59 AQCIWQQEP--TCRCPPGFAGPRCEKLITVNFVGKDSYVELAS 100 



RESULT 6 

ID 016004 PRELIMINARY; PRT; 2531 AA. 

AC 016004; 

DT Ol'JAN-1998 (TREMBLREL. 05, CREATED) 

DT OWAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH HOMOLOG. 

OS LYTECHINUS VARIEGATUS (SEA URCHIN), 

OC EUKARYOTA; METAZOA; ECHINQDERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

OC ECHINACEA; TEMNOPLEUROIDA; TOXOPNEUSTIDAE; LYTECHINUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97454256. 

RA SHERWOOD D.R., MCCLAY D.R.; 

RT "Identification and localization of a sea urchin Notch homologue: 

RT insights into vegetal plate regionalization and Notch receptor 

RT regulation."; 

RL DEVELOPMENT 124:3363-3374(1997), 

DR EMBL; AF000634; G2570351; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 21, 

DR PROSITE; PS01186; EGF 2; 25, 

DR PROSITE; PS01187; EGF.CA; 20. 

DR PFAM; PF00008; EGF; 34, 

•PFAM; PF00023; ank; 6. 

■ PFAM; PF00066; notch; 3. 
GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 2531 AA; 273982 MW; BB9C6F3D CRC32; 

Query Match 29.24; Score 236; DB 5; Length 2531; 

Best Local Similarity 44.0%; Pred. No. l,26e-34; 

Matches 37; Conservative 17; Mismatches 21; Indels 9; Gaps 4; 

Db 779 NIDDCVDEPCLNGGICIDEVNSFQCVCPQTFVGLLCE T-ERSPCEDNQCQNGAT 831 

I Nil I :|: MM!:: hill I Ihll : III:: : hi' 

Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 832 CVYSEDYAGYSCRCTSGFQGNFCD 855 

h :lll :ll I h 

Qy 61 CIWQQ-EP-TCRCPPGFAGPRCE 82 



RESULT 7 

ID Q25058 PRELIMINARY; PRT; 529 AA. 

AC Q25058; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN IA (FRAGMENT) , 

OS HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN) , 

OC EUKARYOTA; METAZOA; ECHINQDERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

OC ECHINACEA; ECHINOIDA; EC H I NOMET R ID AE ; HELIOCIDARIS, 

RN [1] 



RP SEQUENCE FROM N.A. 

RA BISGROVE B.W.; 

RL SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; L33861; G499686; -. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01186; EGF_2 ; 10. 

DR PROSITE; PS01187; EGF.CA; 7, 

DR PFAM; PF00008; EGF; 10, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 529 AA; 55543 MW; 6385F322 CRC32; 

Query Match 28.1%; Score 227; DB 5; Length 529; 

Best Local Similarity 42.7%; Pred. No. 1.85e-32; 

Matches 35; Conservative 15; Mismatches 25; Indels 7; Gaps 2; 

Db 287 NINECASGPCQNGGTCVDGVNGFVCQCPPNYTGTYCE ISLDA- -CSSMPCQNGAT 339 

I ::| : h:h III III: I II ::| :|| : |:: I Mill 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 340 CVNVGANYICECPPGFAGQNCE 361 

h I : I MINI || 
Qy 61 CIWQQEPTCRCPPGFAGPRCE 82 



RESULT 8 

ID 042347 PRELIMINARY; PRT; 1212 AA. 

AC 042347; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE C-SERRATE-2 (FRAGMENT). 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 97184054. 

RA HAYASHI H., MOCHII M., KODAMA R., HAMADA Y. , MIZUNO N., EGUCHI G., 

RA TACHI C; 

RT "isolation of a novel chick homolog of Serrate and its coexpression 

RT with C-Notch-1 in chick development."; 

RL INT. J, DEV. BIOL. 40:1089-1096(1996). 

DR EMBL; D87558; D1022568; -, 

DR PROSITE; PS01186; EGF J; 10. 

DR PROSITE; PS01187; EGF.CA; 8, 

DR PFAM; PF00008; EGF; 14. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 1212 AA; 134188 MW; 0ECF076C CRC32; 

Query Match 26.3%; Score 212; DB 13; Length 1212; 

Best Local Similarity 39.3%; Pred. No. 6.99e-29; 

Matches 33; Conservative 19; Mismatches 25; Indels 7; Gaps 4; 

Db' 468 ETNECESNPCQNGGRCKDLVNGFTCLCAQGFSGVFCE-— MDI-D-F-CEPNPCQNGAK 520 

: ::l :: |::|::| I 1 1 1 : 1 1 : h 1 1 1 1 h 1 1 1 I : : h lllll 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 521 CYDLGGDYYCACPDDYDGKNCSHL 544 

I : : I II : I II 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKL 84 



RESULT 9 

ID Q90819 PRELIMINARY; PRT; 1193 AA. 

AC Q90819; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE C-SERATE-1 PROTEIN (FRAGMENT). 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 
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OC NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=SPINAL CORD; 

RX MEDLINE; 96175595, 

RA MYAT A., HENRIQUE D., ISH-HOROWICZ D,, LEWIS J.; 

RT "A chick homologue of Serrate and its relationship with Notch and 

RT Delta homologues during central neurogenesis."; 

•RL DEV, BIOL. 174:233-247(1996). 

DR EMBL; X95283; E224084; -. 

DR PROSITE; PS00010; ASX HYDROXYL; 10, 

DR PROSITE; PS01186; EGF 2; 12. 

DR PROSITE; PS01187; EGF_CA; 8. 

DR PFAM; PF00008; EGF; 14. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 1193 AA; 131039 MW; 55E5FCD1 CRC32; 

Query Match 25.9%; Score 209; DB 13; Length 1193; 

Best Local Similarity 37.8*; Pred, No, 3.59e-28; 

•Matches 31; Conservative 18; Mismatches 26; Indels 7; Gaps 2; 
' 463 NECASNPCMNGGHCQDEINGFQCLCPAGFSGNLC-Q LDIDYCEPNPCQNGAQCF 515 
"I - I :|::| Ihll: hll llll :| : I: |: lllllll: 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 516 NLAMDYFCNCPEDYEGKNCSHL 537 

: : I II : I I I 
Qy 63 WQQEPTCRCPPGFAGPRCEKL 84 



RESULT 10 

ID 042374 PRELIMINARY; PRT; 752 AA. 

AC 042374; 

DT 01-JAN-1998 (TREMBLREL, 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE NOTCH RECEPTOR PROTEIN (FRAGMENT), 

GN NOTCH6. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM N.A, 

RA WESTIN J., LARDELLI M.; 

RL DEV. GENES EVOL. 207:51-63(1997). 

DR EMBL; Y10354; E293438; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 9. 

t PROSITE; PS0U86; EGFJ; 15. 
PROSITE; PS01187; EGF_CA; 7. 
PFAM; PF00008; EGF; 16. 

DR PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

FT NONJER 752 752 

SQ SEQUENCE 752 AA; 82103 MW; 72E254FB CRC32; 

Query Match 25.8%; Score 208; DB 13; Length 752; 

Best Local Similarity 38.04; Pred. No. 6. 18e-28; 

Matches 30; Conservative 15; Mismatches 29; Indels 5; Gaps 3; 

Db 462 GPRCKNGGQCVDGVGRYTCNCPPGFAGEHCEGDVNEC-RSGPC-YS-PGTIDCVPLIN 516 

I :h:|:|lll I III II Ihl II I :|: : 

Qy 7 GHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCIWQQ 66 

Db 517 SYQCRCRLGYTGQRCESMV 535 

III I::| III 
Qy 67 EPTCRCPPGFAGPRCEKLI 85 



RESULT 11 

ID Q25059 PRELIMINARY; PRT; 406 AA. 



AC Q25059; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN III (FRAGMENT), 

OS HELIOCIDARIS ERYTHROGRAMMA (SEA URCHIN) , 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; EUECHINOIDEA; 

OC ECHINACEA; ECHINOIDA; ECHINOMETRIDAE; HELIOCIDARIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA BISGROVE B.W.; 

RL SUBMITTED (JUN-1995) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; L33862; G499688; -. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01186; EGF 2; 6. 

DR PROSITE; PS01187; EGF.CA; 5. 

DR PFAM; PF00008; EGF; 7. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 406 AA; 43475 MW; 45E6EE2C CRC32; 

Query Match 25 .4%; Score 205; DB 5; Length 406; 

Best Local Similarity 41.2*; Pred. No. 3.15e-27; 

Matches 35; Conservative 13; Mismatches 30; Indels 7; Gaps 4; 

Db 12 DGDXNPNPCQNGAACIDQVNDYECICPPGFTGDNCE-TD-I--DV--CASAPCRNGGA 64 

: II! : l::ll hhl! I llll Ihl II : : I Ml: 
Qy 1 NNDDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQ 60 

Db 65 CVDGVNGYTCNCIPGFDGDNCENNI 89 

I: II I III I MM 
Qy 61 CIWQQEPTCRCPPGFAGPRCEKLI 85 



RESULT 12 

ID Q19350 PRELIMINARY; PRT; 1722 AA. 

AC Q19350; 

DT Ol-NQV-1996 (TREMBLREL, 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE SIMILAR TO EGF-LIKE REPEATS. NCBI GI: 1125776. 

GN F11C7.4. 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M, , 

RA BONFIELD J., BURTON J,, CONNELL M., COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M. , DEAR S., DU Z,, DURBIN R. ( FAVELLO A., FULTON L., 

RA GARDNER A., GREEN P., HAWKINS T., HILLIER L., JIER M. , JOHNSTON L, , 

RA JONES M., KERSHAW J,, KIRSTEN J., LAISTER N. , LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M. , 

RA PARSONS J,, PERCY C, RIFKEN L., ROOPRA A,, SAUNDERS D., SHOWNKEEN R, , 

RA SMALDON N, , SMITH A., SONNHAMMER E., STADEN R. , SULSTON J., 

RA THIERRY-MIEG J., THOMAS K,, VAUDIN M., VAUGHAN K., WATERSTON R., 

RA WATSON A., WEINSTOCK L., WILKINSON-SPROAT J., WOHLDMAN P.; 

RT "2,2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans,"; 

RL NATURE 368:32-38(1994). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RA TAICH A., VETTER J.; 

RL SUBMITTED (JAN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U42839; G1125776; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 5. 

DR. PROSITE; PS01186; EGFJ; 19, 

DR PROSITE; PS01187; EGF.CA; 3, 

DR PFAM; PF00008; EGF; 24. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 
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SQ SEQUENCE 1722 AA; 188383 MW; CCFB86B8 CRC32; 

Query Match 25.21; Score 203; DB 5; Length 1722; 

Best Local Similarity 38,6%; Pred. No. 9.30e-27; 

Matches 32; Conservative 22; Mismatches 21; Indels 8; Gaps 6; 

Db 163 DEDECKENFCQNGADC - ENLKGS Y ECKCLKGFSGKYCEIQDKK — QCTS - D - Y - CHNNG 215 

"1:1 = I::||:| ::::l I I I llll :|l : I :: I I |:| : 
Qy 1 NNDDCVGHKCRHGAQCVDEVNG-YTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGA 59 

Db 216 QCISTGSDLSCKCSPGFDGAFCE 238 

III : :|:|:|l! |: II 
Qy 60 QCIWQQEPTCRCPPGFAGPRCE 82 



RESULT 13 

P87357 PRELIMINARY; PRT; 717 AA. 
P87357; 

01-MAY-1997 (TREMBLREL, 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE DELTAD TRANSMEMBRANE PROTEIN PRECURSOR. 

GN DELTAD. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EOKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFQRMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO, 

RN [1] 

RP SEQUENCE FROM N,A. 

RX MEDLINE; 97346722, 

RA DORNSEIFER P., TAKKE C, CAMPOS -ORTEGA J. A.; 

RT "Overexpression of a zebrafish homologue of the Drosophila neurogenic 

RT gene Delta perturbs differentiation of primary neurons and somite 

RT development."; 

RL MECH. DEV. 63:159-171(1997). 

DR EMBL; Y11760; E307461; -, 

DR PROSITE; PS0U86; EGFJ; 8. 

DR PROSITE; PS01187; EGF_CA; 2, 

DR PFAM; PF00008; EGF; 6, 

KW SIGNAL; TRANSMEMBRANE; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT SIGNAL 1 19 POTENTIAL. 

FT CHAIN 20 717 DELTAD TRANSMEMBRANE PROTEIN. 

SQ SEQUENCE 717 AA; 79061 MW; 5CC32ECA CRC32; 

•Query Match 24.7%; Score 199; DB 13; Length 717; 

Best Local Similarity 40.51; Pred, No. 8.07e-26; 
Matches 32; Conservative 14; Mismatches 26; Indels 7; Gaps 4; 

Db 405 DHCSSNPCSNDAQCLDLVDSYLCQCPEGFTGTHCEDN--I--D-E-CATYPCQNGGTCQ 457 

I I :: I : MM |::| I ||:||:| I : : | | Ml): I 
Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 458 DGLSDYTCTCPPGYTGKNC 476 

: II lll|::| I 
Qy 63 WQQEPTCRCPPGFAGPRC 81 



RESULT 14 

ID 057462 PRELIMINARY; PRT; 802 AA. 

AC 057462; 

DT 01-JUN-1998 (TREMBLREL. 06, CREATED) 

DT 01-JUN-1998 (TREMBLREL. 06, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE DELTAA. 

GN DELTAA, 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO, 

RN [1] 

RP SEQUENCE FROM N.A, 

RX MEDLINE; 98165392, 

RA APPEL B., EISEN J.S.; 



RT "Regulation of neuronal specification in the zebrafish spinal cord by 

RT Delta function."; 

RL DEVELOPMENT 125:371-380(1998). 

DR EMBL; AF030031; G2809389; -. 

DR PROSITE; PS01186; EGF_2; 8. 

KW GLYCOPROTEIN. 

SQ SEQUENCE 802 AA; 88941 MW; 42F041BD CRC32; 

Query Match 24.74; Score 199; DB 13; Length 802; 

Best Local Similarity 38.1%; Pred. No. 8.07e-26; 

Matches 32; Conservative 16; Mismatches 29; Indels 7; Gaps 2; 

Db 447 DHCSSSPCSNGARCVDLVNSYLCQCPDGFTGMNCDRAGD— -E--CSMYPCQNGGTCQ 499 

II: :||:||| ||:| | ||:||:|: ::: : | | I |: 
Cy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCI 62 

Db 500 EGASGYMCTCPPGYTGRNCSSPVS 523 

I !lll::| I :: 
Qy 63 WQQBPTCRCPPGFAGPRCEKLIT 86 



RESULT 15 

ID Q06007 PRELIMINARY; PRT; 387 AA. 

AC Q06007; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE NOTCH PROTEIN HOMOLOG 1 (MOTCH A PROTEIN) (FRAGMENT). 

GN NOTCHl OR MOTCH A. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-F1 (CBA X C57BL); TISSUE-WHOLE EMBRYO; 

RX MEDLINE; 93178563. 

RA LARDELLI M., LENDAHL U.; 

RT "Motch A and motch B--two mouse Notch homologues coexpressed in a 

RT wide variety of tissues."; 

RL EXP, CELL RES. 204:364-372(1993). 

DR EMBL; X68278; G287988; -. 

DR MGD; MGI: 97363; NOTCHl. 

DR PFAM; PF00008; EGF; 6. 

DR PFAM; PF00066; notch; 3. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT. 

FT NONJER 1 1 

FT NONJER 387 387 

SQ SEQUENCE 387 AA; 41497 MW; D1FD6C00 CRC32; 

Query Match 24.4%; Score 197; DB 11; Length 387; 

Best Local Similarity 35.8%; Pred. No, 2.37e-25; 

Matches 29; Conservative 15; Mismatches 36; Indels 1; Gaps 1; 

Db 25 NECLSQPCQNGGTCIDLTNSYKCSCPRGTQGVHCEINVDDCHPPLDPASRSPKCFNNGTC 84 

::|::: l"h hi hi I Ihl h II I : I I : I 

Qy 3 DDCVGHKCRHGAQCVDEVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQY-ECQNGAQC 61 

Db 85 VDQVGGYTCTCPPGFVGERCE 105 

: II Mill I III 
Qy 62 IWQQEPTCRCPPGFAGPRCE 82 



Search completed: Fri May 28 09:37:15 1999 
Job time : 33 sees, 
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Release 3.1A John F, Collins, Biocomputing Research Unit, 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

^rch_pp protein - protein database search, using Smith-Waterman algorithm 

Run on: Fri May 28 09:39:01 1999; MasPar time 11,42 Seconds 

452.494 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description: 
Perfect Score: 



>US-09-191-647-14 

(1-243) from US09191647 . pep 

1905 

1 ILDVASLRQAPGENGTSFHG SSFVDEVEKWKCGCARCAS 243 



Scoring table: PAM 150 
Gap 11 

Searched: 170751 seqs, 21266608 residues 

Post-processing: Minimum Match 0% 

Listing first 45 summaries 



a-geneseq35 
1: parti 2 
8:part8 9 
' :partl4 
:partl9 
:part24 
:part29 
:part34 
:part39 



:part2 3:part3 4 :par.t4 5:part5 6:part6 7; part? 
■part9 10:partl0 llipartll 12:partl2 13:partl3 

15:partl5 16:partl6 17:partl7 18:partl8 

20:part20 21:part21 22:part22 23:part23 

25:part25 26:part26 27:part27 28:part28 

30:part30 31:part31 32:part32 33:part33 

35:part35 36:part36 37:part37 38:part38 



d 31.391; Variance 131.102; 



3.239 



Pred, No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



esult 
No. Score 


Query 

Match Length 


DB 


ID 


Description 


Pred. No. 


l 1100 


57.7 


1534 


30 


W46966 


Amino acid sequence o 


2.21e 


97 


2 570 


29.9 


228 


30 


W46967 


Amino acid sequence o 


4.05e 


44 


3 499 


26.2 


183 


30 


W46968 


Amino acid sequence o 


4.14e 


37 


4 275 


14.4 


1872 


36 


W68510 


Partial human Notch-3 


1.76e 


15 


5 275 


14.4 


2321 


36 


W49698 


Human Notch3 protein. 


1.76e 


15 


6 263 


13.8 


612 


28 


W39256 


Human partial mature 


2.33e 


14 


7 263 


13,8 


737 


28 


W39257 


Human membrane protei 


2.33e 


14 


8 258 


13.5 


1036 


25 


W18351 


Proliferation and dif 


6.84e 


14 


9 258 


13.5 


1187 


25 


W18352 


Proliferation and dif 


6.84e 


14 


10 258 


13.5 


1208 


28 


W40827 


Human Jagged protein. 


6.84e 


14 


11 258 


13.5 


1218 


29 


W44301 


Human serrate 1. 


6.84e 


14 


12 258 


13.5 


1218 


19 


W05833 


Human Serrate-1 (HJl) 


6.84e 


14 


13 256 


13.4 


1193 


19 


W05835 


Chick Serrate. 


1.05e 


13 


14 246 


12.9 


1218 


25 


W18354 


Proliferation and dif 


8.92e 


13 


15 245 


12.9 


1480 


5 


R25079 


Drosophila SLIT prote 


l.lOe 


12 


16 238 


12.5 


1055 


29 


W44298 


Human serrate 2 prote 


4.90e 


12 



17 


238 


12.5 


1212 29 


W44299 


Human serrate 2. 


4.90e-12 


18 


238 


12.5 


1257 19 


W05834 


Human Serrate -2 (HJ2) 


4.90e-12 


19 


234 


12.3 


473 17 


R86869 


Adhesive protein. 


1.15e-ll 


20 


230 


12.1 


833 6 


R28960 


Delta Dll, 


2.67e-ll 


21 


216 


11.3 


727 21 


W11719 


C -Delta -1 polypeptide 


5.12e-10 


22 


216 


11.3 


740 21 


W00876 


C -Delta -1 polypeptide 


5.12e-10 


23 


213 


11.2 


722 21 


W11720 


M-Delta "1 polypeptide 


9 . 60e-10 


24 


211 


11.1 


520 25 


W18348 


Proliferation and dif 


L,46e-09 


25 


211 


11.1 


702 25 


W18349 


Proliferation and dif 


L.46e-09 


26 


211 


11.1 


723 25 


W18353 


Proliferation and dif 


l,46e-09 


27 


205 


10.8 


157 21 


W11730 


H-Delta-1 polypeptide 


5.11e-09 


28 


205 


10.8 


660 21 


W11725 


H-Delta-1 polypeptide 


5.11e-09 


29 


206 


10.8 


685 37 


W80813 


Nucleotide sequence o 


4.15e-09 


30 


196 


10.3 


383 10 


R56166 


Neuroendocrine tumor 


3.31e-08 


31 


196 


10.3 


1404 7 


R38304 


Sequence of a serrate 


3.31e-08 


32 


191 


10,0 


1257 9 


R46627 


Neurocan core protein 


9.30G-08 


33 


188 


9.9 


2409 3 


R12609 


Vers lean. 


1.72e-07 


34 


179 


9.4 


385 10 


R56167 


Neuroendocrine tumor 


1.09e-06 


35 


177 


9,3 


751 10 


R53088 


Human masking protein 


1.64e-06 


36 


177 


9,3 


752 10 


R53087 


Human masking protein 


1.64e-06 


37 


177 


9.3 


756 10 


R53086 


Human masking protein 


l,64e-06 


38 


177 


9,3 


845 10 


R53089 


Human masking protein 


l,64e-06 


39 


177 


9.3 


1355 3 


R14584 


TGF beta 1 binding pr 


l,64e-06 


40 


177 


9.3 


1712 4 


R22461 


Masking protein high 


l,64e-06 


41 


169 


8,9 


179 37 


W75100 


Human secreted protei 


8.29e-06 


42 


166 


8.7 


379 5 


R25565 


Beta -IG -Ml. 


1.52e-05 


43 


163 


8.6 


77 6 


R28962 


BLR- 11 and -12. 


2.77e-05 


44 


162 


8.5 


559 25 


W30844 


Partial rat thrombomo 


3.39e-05 


45 


162 


8.5 


577 25 


W30845 


Rat thrombomodulin. 


3.39e-05 



RESULT 1 

ID W46966 standard; Protein; 1534 AA. 

AC W46966; 

DT 06-JUL-1998 (first entry) 

DE Amino acid sequence of a human slit-like polypeptide, 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

kw cancer; antibody, 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Peptide • 1..26 

FT /note- "signal peptide" 

FT Protein 27., 1534 

FT /note- "mature protein" 

PN J10087699-A. 

PD 07-APR-1998. 

PF 15-JUL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-267127/24. 

DR N-PSDB; V16978. 

PT Human Slit-like protein - useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 31-35; 45pp; Japanese. 

CC The present sequence represents a novel human slit-like protein (the 

CC mature protein is claimed in Claim 1.) . The slit-like polypeptide is 

CC useful for diagnosis and treatment of brain-specific diseases and 

CC cancers. Antibodies directed against the protein, or its fragments 

CC can also be used for diagnosing cancer. 

SQ Sequence 1534 AA; 

Query Match 57.7%; Score 1100; DB 30; Length 1534; 

Best Local Similarity 55.9%; Pred. No. 2.21e-97; 

Matches 128; Conservative 46; Mismatches 55; Indels 0; Gaps 0; 

Db 1306 ngtgfhgcirnlyinnelqdftktqmkpgvvpgcepcrklyclhgicqpnatpgpmchce 1365 

llhimilllllhlllll I I |::IHIII:I I II III:: :| 1 II 
Qy 14 NGTSFHGC IRNLY I NSELQDFRKMPMQTG ILPGCEPCHKKVCAHGCCQPS SQSGFTCECE 73 

Db 1366 agwvglhcdqpadgpchghkcvhgqcvpldalsyscqcqdgysgalcnqagalaepcrgl 1425 
Ihl III II hlllll |:|::hllll I :| :l II:: I :||: : 
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Oy 74 EGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQMI 133 

Db 1426 qclhghcqasgtkgahcvcdpgfsgelceqesecrgdpvrdfhqvqrgyaicqttrplsw 1485 

1111:11 : I |::||:|: |::| |||: :||: I |:||| MM: :|: 
Oy 134 KCKHGKCRLSGVGQPYCECNSGFTGDSCDREISCRGERIRDYYQKQQGYAACQTTKKVSR 193 

Db 1486 vecrgscpgqgccqglrlkrrkftfecsdgtsfaeevekptkcgcalca 1534 

:M!|:hl II II llll::IM:||:|| HIM Mil! || 
Qy 194 LECRGGCAGGQCCGPLRSKRRKYSFECTDGSSFVDEVEKWKCGCARCA 242 



RESULT 2 

ID W46967 standard; Protein; 228 AA. 
AC W46967; 

DT 06-JUL-1998 (first entry) 

DE Amino acid sequence of the specification. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 
KW cancer; antibody. 
OS Mus sp. 
PN J10087699-A. 

♦07-APR-1998. 
15-JCL-1997; 205351. 
16-JUL-1996; JP-186219. 
(AS AH ) ASAHI KASEI KOGYO KK. 
DR DPI; 98-267127/24. 
DR N-PSDB; V16966 . . 

PT Human Slit-like protein - useful for diagnosis and treatment of 
PT brain-specific diseases and cancers 
PS Disclosure; Page 35; 45pp; Japanese, 

CC The present sequence appears in the specification. The specification 
CC describes a novel human slit-like protein (the mature protein is claimed 
CC in Claim 1). The slit-like polypeptide is useful for diagnosis and 
CC treatment of brain-specific diseases and cancers. Antibodies directed 
CC against the protein, or its fragments can also be used for diagnosing 
CC cancer . 
SO Sequence 228 AA; 

Query Match 29.9%; Score 570; DB 30; Length 228; 

Best Local Similarity 57.34; Pred, No. 4.05e-44; 

Matches 67; Conservative 20; Mismatches 30; Indels 0; Gaps 0; 

Db 110 ngtsfhgcirnlyinnelqdftktqmkpgvvpgcepcrklyclhgicqpnatpgpvchce 169 

IHIIIIIIIIIIIMIill I I |::||||||:| | I |||;; ;| I I 
Qy 14 NGTSFHGCIRNLYINSELQDFRKMPMQTGILPGCEPCHKKVCAHGCCQPSSQSGFTCECE 73 

Db 170 agwgglhcdqpvdgpchghkcvhgkcvpldalayscqcqdgysgalcnqvgavaepc 226 

II I III : II hlllll l:|::|::||| I :| :| II:: : :|| 
Oy 74 EGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFSYSCKCLEGHGGVLCDEEEDLFNPC 130 

»LT 3 
W46968 standard; Protein; 183 AA. 

AC W46968; 

DT 06-JUL-1998 (first entry) 

DE Amino acid sequence of the specification. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody. 

OS Rattus sp. 

PN J10087699-A. 

PD 07-APR-1998.' 

PF 15-JUL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-267127/24, 

DR N-PSDB; V16967. 

PT Human Slit-like protein - useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 37-38; 45pp; Japanese. 

CC The present sequence appears in the specification. The specification 

CC describes a novel human slit-like protein (the mature protein is claimed 

CC in Claim 1), The slit-like polypeptide is useful for diagnosis and 

CC treatment of brain-specific diseases and cancers. Antibodies directed 

CC against the protein, or its fragments can also be used for diagnosing 



CC cancer, 

SQ Sequence 183 AA; 

Query Match 26.24; Score 499; DB 30; Length 183; 

Best Local Similarity 62.5%; Pred. Ho. 4.14e-37; 

Matches 60; Conservative 14; Mismatches 22; indels 0; Gaps 0; 

Db 88 ngtsfhgcirnlyinnelqdftktqmkpgvvpgcepcrklyclhgicqpnatpgpvchce 147 

llllllllllllllhlllll I I |::IMIII: I I |||:: :| I 
Qy 14 NGTSFHGCIRNLYINSELQDFRKMPMQTGILPGCEPCHKKVCAHGCCQPSSQSGFTCECE 73 

Db 148 agwgglhcdqpvdgpchghkcvhgkcvpldalaysc 183 

II I III : II 1:11111 |:|::|::||| 
Qy 74 EGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFSYSC 109 



RESULT 4 

ID W68510 standard; Protein; 1872 AA. 

AC W68510; 

DT 06-JAN-1999 (first entry) 

DE Partial human Notch- 3 protein. ( 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy, 

OS Homo sapiens, 



FH 


Key 


Location/Qualifiers 


FT 


Miscjifference 


328 








FT 




/note- 


"encoded 


by 


NAN" 


FT 


Miscjifference 


401 






FT 




/note- 


"encoded 


by 


GNN" 


FT 


Miscjifference 


403 






FT 




/note- 


"encoded 


by 


GNC" 


FT 


Miscjifference 


406 






FT 




/note- 


"encoded 


by 


GNN" 


FT 


Miscjifference 


409 






FT 




/note- 


"encoded 


by 


NNT" 


FT 


Miscjifference 


420 






FT 




/note- 


"encoded 


by 


GNC" 


FT 


Miscjifference 


706 






FT 




/note- 


"encoded 


by 


NNN" 


FT 


Miscjifference 


708 






FT 




/note- 


"encoded 


by 


CCN" 


FT 


Miscjifference 


719 






FT 




/note- 


"encoded 


by 


CGN" 


FT 


Miscjifference 


728 






FT 




/note- 


"encoded 


by 


CNT 


FT 


Miscjifference 


729 






FT 




/note- 


"encoded 


by 


GTN" 


FT 


Miscjifference 


759.. 789 




FT 




/note- 


"encoded 


by 


NNN" 


FT 


Miscjifference 


1425 






FT 




/note- 


"encoded 


by 


GNA" 


PN 


FR2751985-A1. 








PD 


06-FEB-1998. 










PF 


01-AUG-1996; 009733, 








PR 


01-AUG-1996; FR 


009733 









PA (INRM ) INSERM INST NAT SANTE S RECH MEDICALE, 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133137/13, 

DR N-PSDB; V57163. 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig la-lg; 42pp; French. 

CC This sequence represents a partial human notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 

CC are thought to be involved in neurological disorders, especially of the 

CC cerebral autosomal dominant arteriopathy with subcortical infarcts and 

CC leukoencephalopathy (CADASIL) type. Blocking expression of a mutated 

CC Notch3 gene or by substitution therapy with non -mutated Notch3 gene or 

CC protein can be used to treat CADASIL or related disorders. 
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SQ Sequence 1872 AA; 

Query Match 14.4%; Score 275; DB 36; Length 1872; 

Best Local Similarity 36.9%; Pred. No. 1.76e-15; 

Matches 48; Conservative 20; Mismatches 50; Indels 12; Gaps 11; 

Db 55 dpclsspcahgarcsvgpdgrflcscppgyqgrsc-rsdvdecrvgepcrhggtclntpg 113 

:|| llll I :::: II I I I I : I I :|: I II III : 
Qy 48 EPCHKKVCAHGC-CQPSSQSGFTCECEEGWMGPLCDQRTNDPC-LGNKCVHG-TCLPINA 104 

Db 114 -sfrcqcpagytgplc-enpav-pcapspcrnggtcrqsg-dltydcaclpgfegqnce 168 

I: I I I I II I: : II l::l II II III :|| |::|: 
Qy 105 FSYSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPY-CECNSGFTGDSCD 162 

Db 169 vnvddcpghr 178 

:: I I I 
m 163 REIS-CRGER 171 



RESULT 5 

ID W49698 standard; Protein; 2321 AA. 

AC W4969S; 

DT 21-DEC-1998 (first entry) 

DE Human Notch3 protein, 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy; therapy. 

OS Homo sapiens. 

PN FR2751986-A1. 

PD 06-FEB-1998. 

PF 16-APR-1997; 004680. 

PR 01-AUG-1996; FR-009733, 

PA (INRM ) INSERM INST NAT SANTE 4 RECH MEDICALS. 

pi Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133138/13, 

DR N-PSDB; V57001, 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

pt sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig 1.1-1.8; 45pp; French. 

CC This sequence represents the human Notch3 protein, a transmembrane 

CC receptor protein involved in lateral inhibition and regulating 

CC developmental cascades of neurogenic genes. Mutated Notch3 proteins 

^ are thought to be involved in neurological disorders, especially of 
the cerebral autosomal dominant arteriopathy with subcortical infarcts 
and leukoencephalopathy (CADASIL) type. Blocking expression of a 

CC mutated Notch3 gene or by substitution therapy with non-mutated Notch3 

CC gene or protein can be used to treat CADASIL or related disorders, 

SQ Sequence 2321 AA; 

Query Match 14.4%; Score 275; DB 36; Length 2321; 

Best Local Similarity 36.9%; Pred, No, 1.76e-15; 

Matches 48; Conservative 20; Mismatches 50; Indels 12; Gaps 11; 

Db 121 dpclsspcahgarcsvgpdgrflcscppgyqgrsc-rsdvdecrvgepcrhggtclntpg 179 

:H llll I : : I I I I I I : I I :|: I II III : 
Qy 48 EPCHKKVCAHGC-CQPSSQSGFTCECEEGWMGPLCDQRTNDPC-LGNKCVHG-TCLPINA 104 

Db 180 -sfrcqcpagytgplc-enpav--pcapspcrnggtcrqsg-dltydcaclpgfegqnce 234 

I: I I I Ml I: : II |::| llll I I I :|| |::|: 
Qy 105 FSYSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHG -KCRLSGVGQPY "CECNSGFTGDSCD 162 

Db 235 vnvddcpghr 244 
I I I 

Qy 163 REIS-CRGER 171 



RESULT 6 

ID W39256 standard; protein; 612 AA, 
AC W39256; 

DT 19-MAY-1998 (first entry) 

DE Human partial mature membrane protein. 



KW Epidermal growth factor motif; EGF motif; membrane protein; disease; 

KW brain; nervous tissue; cancer, 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Protein 1. .612 

FT /note- "partial mature protein" 

PN J10036395-A. 

PD 10-FEB-1998. 

PF 24-JUL-1996; 194467. 

PR 24-JUL-1996; JP-194467. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-174912/16. 

PT New human membrane protein - specifically expressed in brain and 

PT nervous tissue; used in diagnosis of diseases specific to these 

PT tissues and cancer 

PS Claim 1; Column 18-19; 26pp; Japanese. 

CC W39256 represents the partial mature amino acid sequence of a novel 

CC membrane protein which contains epidermal growth factor (EGF) motifs. 

CC The new membrane protein is expressed specifically in brain and nervous 

CC tissue. The protein and DNA can be used in the diagnosis of brain and 

CC nerve system specific diseases and cancer. 

SQ Sequence 612 AA; 

Query Match 13 .8%; Score 263; DB 28; Length 612; 

Best Local Similarity 39,8%; Pred. No. 2,33e-14; 

Matches 45; Conservative 22; Mismatches 35; Indels 11; Gaps 10; 

Db 375 crngatci-sslsgftcqcpegyfgsaceekv-dpcasspcqnngtcy-vdgvhftcncs 431 

I :l I II ll!ll:| II I: |::: III :: I :||| ::|:| 
Qy 55 CAHGC-CQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKC-VHGTCLPINAFSYSCKCL 112 

Db 432 pgftgptcaqlid-f--calspcahgtcr-s-vgtsykclcdpgyhglyceee 479 

I I I : II I : I llll I II :| I |::|: I h I 
Qy 113 EGHGGVLCDEEEDLFNPCQMIKCKHGKCRLSGVGQPY-CECNSGFTGDSCDRE 164 



RESULT 7 

ID W39257 standard; protein; 737 AA. 

AC W39257; 

DT 19-MAY-1998 (first entry) 

DE Human membrane protein. 

KW Epidermal growth factor motif; EGF motif; membrane protein; disease; 

KW brain; nervous tissue; cancer; disease. 

OS Homo sapiens. 



FH Key Location/Qualifiers 

FT Peptide 1..26 
FT /label- signal 

FT Protein 27.. 737 

ft /label" membrane_protein 



PN J10036395-A. 

PD 10-FEB-1998. 

PF 24-JUL-1996; 194467. 

PR 24-JUL-1996; JP-194467. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-174912/16. 

DR N-PSDB; V09641. 

PT New human membrane protein ■ specifically expressed in brain and 

PT nervous tissue; used in diagnosis of diseases specific to these 

PT tissues and cancer 

PS Claim 2; Pages 19-21; 26pp; Japanese, 

CC W39257 represents the amino acid sequence of a novel membrane protein 

CC which contains epidermal growth factor (EGF) motifs, The new membrane 

CC protein is expressed specifically in brain and nervous tissue. The 

CC protein and DNA can be used in the diagnosis of brain and nerve system 

CC specific diseases and cancer, 

SQ Sequence 737 AA; 

Query Match 13,8%; Score 263; DB 28; Length 737; 

Best Local Similarity 39.8%; Pred, No. 2,33e-14; 

Matches 45; Conservative 22; Mismatches 35; Indels 11; Gaps 10; 

Db 401 crngatc i - s s lsgf tcqcpegyf gsaceekv -dpcas spcqnngtcy - vdgvhf tcncs 457 

I :| I II llllhl II I: |::: III :: I :||| ::: ::|:| 
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Qy 55 CAHGC-CQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKC-VHGTCLPINAFSYSCKCL 112 

Db 458 pgftgptcaqlid-f--calspcahgtcr-s-vgtsykclcdpgyhglyceee 505 

I I I : II I : I II II I II :| I |::|: I |: I 
Qy 113 EGHGGVLCDEEEDLFNPCQMIKCKHGKCRLSGVGQPy-CECNSGFTGDSCDRE 164 



RESULT B 

ID W18351 standard; protein; 1036 AA. 

AC W18351; 

DT ll-PEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

KW immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997 . 

PF 15-NOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

f 17 -NOV- 1995; JP-299611. 
(ASAH ) ASAHI KASEI KOGYO KK. 
Itoh A, Sakano S; 
WPI; 97-298110/27. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 ■ suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 5; Page 66-71; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression. 

SQ Sequence 1036 AA; 

Query Match 13.5%; Score 258; DB 25; Length 1036; 

Best Local Similarity 37.0%; Pred. No. 6.84e-14; 

Matches 44; Conservative 20; Mismatches 45; Indels 10; Gaps 9 

Db 566 nvcgphgkcksqsggkftcdcnkgftgtychenind-cesnpcrnggtcid-gvnsykci 623 

:||: II I : I : lll:|: I I ! : 1 1 I : I I : I 1 1 : II I 
Qy 53 KVCAtHGCCQPSSOSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 624 csdgwegayc--etni-ndcsqnpchnggtcr-dlvndfycdckngwkgktchsrdsqc 678 

I :| I I I I I I : !l:|::| I :| |: I 

Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPYCECNSGFTGDSCD-REISC 167 



•ULT 9 
W18352 standard; protein; 1187 AA, 
W18352; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 

KW Proliferation; differentiation; suppression; human; delta-1; 

KW serrate-1; blood cell; neuron; leukaemia; malignant tumour; 

kw immunosuppression. 

OS Homo sapiens. 

PN W09719172-A1. 

PD 29-MAY-1997 . 

PF 15-KOV-1996; J03356. 

PR 30-NOV-1995; JP-311811. 

PR 17-KOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK, 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27, 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 - suppress 

PT proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 6; Page 71-76; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells, The polypeptide may be used for the 



CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g. after immunosuppression, 

SQ Sequence 1187 AA; 

Query Match 13.5%; Score 258; DB 25; Length 1187; 

Best Local Similarity 37.0%; Pred. No. 6.84e-14; 

Matches 44; Conservative 20; Mismatches 45; Indels 10; Gaps 9; 

Db 566 nvcgphgkcksqsggkftcdcnkgftgtychenind-cesnpcrnggtcid-gvnsykci 623 

:||: II I : I : Nhh I I I : I M : I I : I 1 1 : III 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 624 csdgwegayc--etni-ndcsqnpchnggtcr-dlvndfycdckngwkgktchsrdsqc 678 

. I :| I I I :: I I Ml I : lh|::| I :| |: I 
Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPYCECNSGFTGDSCD-REISC 167 



RESULT 10 

ID W40827 standard; Protein; 1208 AA, 

AC W40827; 

DT 21-MAY-1998 (first entry) 

DE Human Jagged protein. 

KW Jagged; Notch; angiogenesis; endothelial cell; migration; human; 

KW wound repair; vulnerary; injury repair; signal transduction; 

KW motor neurone disease; amyotrophic lateral sclerosis; polymyelitis; 

KW- diagnosis; therapy. 

OS Homo sapiens, 

FH Key Location/Qualifiers 

ft Peptide 1. .11 

FT /label- Sig_peptide 

FT Domain 175,. 220 

FT /note» "DSL (Delta, Serrate, Lag-2 and Apx-1) 

FT domain" 

FT Region 224,, 852 

FT' /note- "EGF-like repeat region containing 16 

FT EGF repeats" 

FT Misc.difference 526 

FT /note- "encoded by ANC" 

FT Region 853.. 992 

FT /note- "cysteine-rich region" 

FT Domain 1058.. 1083 

FT /note- "transmembrane domain" 

FT Region 1084.. 1208 

FT /note- "cytoplasmic region" 

PN W09745143-A1. 

PD 04-DEC-1997. 

PF 30-MAY-1997; U09407. 

PR 31-MAY-1996; US-018841. 

PA (NAAM-) NAT AMERICAN RED CROSS. 

PA (UYGE-) UNIV GENEVE. 

PI Maciag T, Montesano R, Pepper M, Wong MR, Zimrin AB; 

DR WPI; 98-032340/03, 

DR N-PSDB; V03674. 

PT New human Jagged protein - used to inhibit or promote angiogenesis 

PT and to control migration of endothelial cells in injured blood 

PT vessels 

PS Claim 2; Page 54-61; 81pp; English. 

CC This sequence comprises the human homologue of the rat Jagged 

CC protein . Jagged is able to bind Notch protein and is involved in 

CC endothelial cell (EC) migration and differentiation. The human 

CC Jagged amino acid sequence was deduced from a human endothelial 

CC cell cdna (see V03674) induced by exposure to fibrin. Jagged 

CC polypeptides can be expressed in host cell systems. A method for 

CC treating or preventing disease by administering an agent that 

CC (ant) agonises, inhibits, prevents, enhances or stimulates function 

CC of the Notch or Jagged proteins is claimed, as well as a method for 

CC affecting differentiation of mesoderm, endoderm, ectoderm and/or 

CC neuroderm cells. When Jagged is applied to a micro -diameter blood 

CC vessel from which ECs have been removed, damaged or reduced, it 

CC decrease migrations of EC to the site, but when delivered to a 

CC similar site on a large vessel it increases EC migration. Jagged 

CC and its agonists are used to inhibit or prevent angiogenesis (where 
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CC associated with solid tumours, rheumatoid arthritis, inflammation, 

CC or restenosis, particularly preventing angiogenesis from the vaso 

CC vasorum and promoting large vessel EC migration to repair the lumen 

CC of large vessels). Anti-Jagged and Jagged antagonists (e.g. 

CC antisense Jagged and Jagged mutants) are used to promote or enhance 

CC angiogenesis, particularly for wound and injury repair, e.g. where 

CC surgical, traumatic and/or caused by disease, e.g. diabetes -related 

CC (all claimed). Angiogenesis can be modulated in vitro or in vivo 

CC and expression of proteins by gene therapy is included. Modulation 

CC of the Notch-Jagged signalling pathway may also be involved in 

CC placental development and motor neurone diseases such as 

CC amyotrophic lateral sclerosis, poliomyelitis etc. 

SO Sequence 1208 AA; 

Query Match 13.5%; Score 258; DB 28; Length 1208; 

Best Local Similarity 37.0%; Pred, No. 6.84e-14; 

fatches 44; Conservative 20; Mismatches 45; Indels 10; Gaps 9; 
587 nvcgphgkcksqsggkftcdcnkgftgtychenind-cesnpcrnggtcid-gvnsykci 644 
:M: II I : I : 1 1 1 : 1 : I II : II I :| I :| ||: III 
Qy 53 KVCA- HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG ■ TCLPINAFSYSCK 110 

Db 645 csdgwegayc-etni-ndcsqnpchnggtcr-dlvndfycdckngwkgktchsrdsqc 699 

MM I : H:|::| hi I: 

Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPYCECNSGFTGDSCD-REISC 167 



RESULT 11 

ID W44301 standard; Protein; 1218 AA. 

AC W44301; 

DT 19-JUN-1998 (first entry) 

DE Human serrate 1. 

KW Human; serrate 2; regulation; stem cell; differentiation; neoplasm; 

KW leukaemia; endothelial cell; tumour. 

OS Homo sapiens, 

FH Key Location/Qualifiers 

FT Peptide 1. .31 
ft ■ /label= Signal 

FT Protein 32.. 1218 

FT /label= Serrate- 1 

PN WO9802458-A1. 

PD 22-JAN-1998. 

PF ll-JUL-1997; J02414. 
14-MAM997; JP-124063. 
16-JUL-1996; JP-186220. 
(ASAH ) ASAHI KASEI KOGYO KK. 
Itoh A, Sakano S; 

DR WPI; 98-110528/10. 

DR N-PSDB; V15201. 

PT Human serrate-2 gene expression products - used to regulate stem 

PT cell differentiation, useful in treating neoplasms, e.g. leukaemia 

PS Disclosure; Page 77-86; 103pp; Japanese. 

CC The present sequence represents human serrate 1, from the present 

CC invention which describes human serrate 2. The present invention also 

CC describes a method for the preparation of the polypeptides, and 

CC antibodies binding to the polypeptide and its fragments . The polypeptide 

CC and its fragments expressed by the serrate- 2 -gene can be used to inhibit 

CC stem (especially blood stem) cell differentiation and to inhibit 

CC endothelial cell growth. They may be incorporated in a cell culture 

CC media for culturing undifferentiated stem cells. They can also be used 

CC for treatment of neoplasms such as leukaemia. The antibodies can be used 

CC for the diagnosis of malignant tumours. 

SO Sequence 1218 AA; 

Query Match 13,5%; Score 258; DB 29; Length 1218; 

Best Local Similarity 37,04; Pred. No. 6.84e-14; 

Matches 44; Conservative 20; Mismatches 45; Indels 10; Gaps 9; 



Db 597 nvcgphgkcksqsggkftcdcnkgftgtychenind-cesnpcrnggtcid-gvnsykci 654 

:||: II I : I : ll|:|: I I I : 1 1 I : I I : I 1 1 : III 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 655 csdgwegayc--etni-ndcsqnpchnggtcr-dlvndfycdckngwkgktchsrdsqc 709 



I :| I I I :: I I Mill: ll:|;:| I :||: I 
Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPYCECNSGFTGDSCD-REISC 167 



ID 


W05833 standard; Protein; 1218 AA, 


AC 


W05833; 




DT 


28-JAN-1997 


(first entry) 


DE 


Human Serrate -1 (HJ1), 


KW 


Serrate-1; human jagged-1; HJl; Notch; cell differentiation; 


KW 


cell fate; central nervous system; cancer; tissue repair; therapy; 


KW 


diagnosis; antibody. 


OS 


Homo sapiens 




FH 


Key 


T/wiH on /final \ f \&tq 


FT 




1. .1067 


FT 




/laholo Pyf rfl^ol 1 nl »r Hnm;Hn 
/laJJci" DAL-iatciiuifll UUHlalll 


FT 


peptide 


14. .29 


FT 




/label" Sig_peptide 


FT 


domain 


185, ,229 


FT 




/label- DSL 


.FT 




/note* "region of homology with Drosophila Delta 


FT 




and Serrate, predicted to mediate binding 


FT 




with Nntrh" 


FT 


domain 


234., 896 


FT 




/label- ELR 


FT 




/note - "epidermal growth factor-like repeat domain 


FT 


region 


234.. 264 


FT 




/label* elrI 


FT 


region 


265, .299 


FT 




/label- ELR2 


FT 


region 


300,, 339 


FT 




/label- ELR3 


FT 


region 


340,, 377 


FT 




/label- ELR4 


FT 


region 


378. ,415 


FT 




/label- ELR5 


FT 


region 


416. .453 


FT 




/label- ELR6 


FT 


region 


454. .490 


FT 




/label- ELR7 


FT 


region 


491, .528 


FT 




/label- ELR8 


FT 


region 


529. .566 


FT 




/label- ELR9 


FT 


region 


567. .598 


FT 




/label- Partial ELR 


FT 


region 


599,. 632 


FT 




/laholo Partial PTC 


FT 


region 


633, .670 


FT 




/label- ELR10 


FT 


region 


671.. 708 


FT 




/label- ELR11 


FT 


region 


709. .747 


FT 




/label- ELR12 


FT 


region 


748. .785 


FT 




/label- ELRI 3 


FT 


region 


786.. 823 


FT 




/label- ELR14 


FT 


region 


824.. 862 


FT. 




/label- ELR15 


FT 


region 


863. ,879 


FT 




/label- PartialJLR 


FT 


region 


880.. 896 


FT 




/label- PartialJLR 


FT 


domain 


1068.. 1089 


FT 




/label- Transmembrane domain 


FT 


domain 


1090.. 1218 


FT 




/label- intracellular.domain 


PN 


WO9627610-A1, 




PD 


12-SEP-1996. 




PF 


07-MAR-1996; U03172. 


PR 


07-MAR-1995; 


OS-400159. 


PA 


(IMCR ) IMPERIAL CANCER RES TECHNOLOGY, 
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PA (UYYA ) ONIV YALE. 

PI Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 

PI Levis JH, Mann RS, Myat AM; 

DR WPI; 96-425379/42, 

DR N-PSDB; T40090 . 

PT vertebrate Serrate protein and related DNA ■ used to treat or 

PT prevent malignancies characterised by increased Notch activity. 

PS Claim 4; Page 95-98; 161pp; English. 

CC Human Serrate- 1 (W05833) and human Serrate-2 (W05833) are ligands 

CC for the zygotic neurogenic locus Notch, and are believed to play a 

CC major role in determining cell fates (differentiation) in the 

CC central nervous system. Their amino acid sequences were deduced 

CC from cDNA clones (see also T40090-91) isolated from human foetal 

CC brain cDNA libraries. The proteins, antibodies raised to them, 

CC and encoding nucleic acids can be used in the detection of 

CC Serrate sequences and in the treatment of disorders of cell fate 

CC or differentiation, partic, cancer, nervous system disorders 

CC and in tissue repair or regeneration, 

SO Sequence 1218 AA; 

(Query Match 13.5%; Score 258; DB 19; Length 1218; 

list Local Similarity 37.0%; Pred, No, 6 .84e-14; 
pitches 44; Conservative 20; Mismatches 45; Indels 10; Gaps S 

Db 597 nvcgphgkcksqsggkftcdcnkgftgtychenind-cesnpcrnggtcid-gvnsykci 654 

:lh II I : I : I I I : 1 1 I : I I : I 1 1 : I 

Oy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGHKCVHG-TCLPINAFSYSCK 110 

Db 655 csdgwegayc--etni-ndcsqnpchnggtcr-dlvndfycdckngwkgktchsrdsqc 709 

I :l I I I I I Mil I : ||:|::| I :| |: I 
Qy 111 CLBGHGGVLCDEEEDLFNPCOMIKCKHG-KCRLSGVGQPYCECNSGFTGDSCD-REISC 167 



RESULT 13 




ID 


W05835 standard; Protein; 1193 AA. 


AC 


W05835; 




DT 


28-JAN-1997 


(first entry) 


DE 


Chick Serrate. 


KW 


C-Serrate; Notch; cell differentiation; cell fate; tissue repair; 


KW 


central nervous system; cancer; therapy; diagnosis. 


OS 


Gallus sp. 




FH 


Key 


Location/Qualifiers 


FT 


domain 


1. .1041 


FT 




/label" Extracellularjomain 


FT 


peptide 


1..5 • 


FT 




/label- Sig_peptide 

/note» "lacks the N-terminal portion owing to 
truncation of the encoding cDNA clone" 


FT 




FT 




FT 

1 


domain 


158., 203 
/label- DSL 

/note- "region of homology with Drosophila Delta 
and Serrate, predicted to mediate binding 


FT 




with Notch" 


FT 


domain 


208.. 837 


FT 




/label" ELR 


FT 




/note- "epidermal growth factor-like repeat domain 


FT 


region 


208.. 238 


FT 




/label" ELR1 


FT 


region 


239.. 274 


FT 




/label- ELR2 


FT 


region 


275.. 313 


FT 




/label- ELR3 


FT 


region 


314., 351 


FT 




/label- ELR4 


FT 


region 


352.. 390 


FT 




/label" ELR5 


FT 


region 


391.. 427 


FT 




/label- ELR6 


FT 


region 


428.. 464 


FT 




/label- ELR7 


FT 


region 


465.. 502 


FT 




/label- ELR8 


FT 


region 


503.. 540 





/label" ELR9 


region 


541. .606 




/label- ELR10 


region 


607.. 644 




/label- ELR11 


region 


655.. 682 




/label- ELR12 


region 


683. .721 




/label" ELR13 ' 


region 


722. .759 




/label" ELR14 


region 


760.. 797 




/label- ELR15 


region 


798.. 837 




/label- ELR16 


region 


854.. 911 




/label- Cysteine-rich region 


domain 


1042., 1066 




/label- Transmembranejomain 


domain 


1067., 1193 




/label- Intracellularjomain 


WO9627610-A1. 




12-SEP-1996. 




07-MAR-1996; 003172. 


07-MAR-1995; OS-400159. 


(IMCR ) IMPERIAL CANCER RES TECHNOLOGY, 


(UYYA ) ONIV YALE. 



Artavanis-Tsakonas S, Gray GE, Henrique DMP, Ish-Horowicz D; 
Lewis JH, Mann RS, Myat AM; 
WPI; 96-425379/42, 
N-PSDB; T40092. 

Vertebrate Serrate protein and related DNA - used to treat or 
prevent malignancies characterised by increased Notch activity. 
Disclosure; Page 112-115; 161pp; English, 
Chicken Serrate (W05835), or C-Serrate, is a ligand for the zygotic 
neurogenic locus Notch and is believed to play a major role in 
determining cell fates in the central nervous system. Its amino 
acid sequence was deduced from a cDNA clone (T40092) obtd. from an 
optic explant cDNA library. C-Serrate is expressed in the central 
nervous system, cranial placodes, nephric mesoderm, vascular 
system, and limb bud mesenchyme. 
Sequence 1193 AA; 

Query Match 13.4%; Score 256; DB 19; Length 1193; 

Best Local Similarity 37.0%; Pred. No. 1.05e-13; 

Matches 44; Conservative 21; Mismatches 44; Indels 10; Gaps 9 

Db 571 nvcgphgkcksqaggkftcecnkgftgtychenind-cesnpcknggtcid-gvnsykci 628 

:||: III::: lllll: I I I : 1 1 I : I I : II I : III 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 629 csdgwegtyc--etni-ndcsknpchnggtcr-dlvndffceckngwkgktchsrdsqc 683 

I :| I I I :: II : I :| II I : :|||::| I :| |: I 
Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG ■ KCRLSGVGQPYCECNSGFTGDSCD • REISC 167 



RESULT 14 

ID W18354 standard; protein; 1218 AA. 

AC W18354; 

DT ll-FEB-1998 (first entry) 

DE Proliferation and differentiation suppression polypeptide. 
KW Proliferation; differentiation; suppression; human; delta-L- 
RU serrate-1; blood cell; neuron; leukaemia; malignant tumour; 
KW immunosuppression, 
OS Homo sapiens, 

FH Key Location/Qualifiers 

FT Peptide 1..31 
FT /label- Signal 

FT. Protein 32,. 1218 

FT /label- Differentiation.suppression protein 

PN W09719172-A1, 

PD 29-MAY-1997. 

PF 15-NOV-1996; J03356. 
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PR 30-NOV-1995; JP-311811. 

PR 17-NOV-1995; JP-299611. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

PI Itoh A, Sakano S; 

DR WPI; 97-298110/27. 

DR N-PSDB; T70175. 

PT Peptide(s) encoded by human genes delta-1 and serrate-1 ■ suppress 

pt proliferation and differentiation of undifferentiated human blood 

PT cells 

PS Claim 15; Page 83-91; 114pp; Japanese. 

CC The present sequence represents a polypeptide which suppresses 

CC proliferation and differentiation of undifferentiated cells such 

CC as neurons and blood cells. The polypeptide may be used for the 

CC prevention and control of disorders involving undifferentiated 

CC cells, such as leukaemia and malignant tumours, and improvement of 

CC blood formation, e.g, after immunosuppression, 
Sequence 1218 AA; 



Ffyiery Match 12.9%; 
Best Local Similarity 36.lt; 
Matches 43; Conservative 



Score 246; DB 25; Length 1218; 

Pred. No. 8.92e-13; 

21; Mismatches 45; Indels 10; 



Db 597 nvcgphgkcksqsggkftcdcnkgftgtycheninds-esnpcrnggtcid-gvnsykci 654 

:M: II I : I : ll|:|: I I I : ||: :| I :| II: II | 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 655 csdgwegayc--etni-ndcsqnpchnggtcr-dlvndfycdckngwkgktchsrdsqc 709 

Mill: ||:|::| | :| |: | 

Qy 111 CLEGHGGVLCDEEEDLFNPCQMI KCKHG ■ KCRLSGVGQPYCECNSG FTG DSCD - REISC 167 



ID 


R25079 standard; Protein; 1480 AA, 


AC 


R25079; 




DT 


05-JAN-1993 


(first entry) 


DE 


Drosophila SLIT protein involved in axon pathway development, 


KW 


Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 


KW 


embryonic CNS; leucine-rich repeat; Flank-LRR-Flank; 


KW 


midline glial cells; axonogenesis; cell-cell interaction; ss. 


OS 


Drosophila melanogaster. 


FH 


Key 


Location/Qualifiers 


FT 


peptide 


1. .36 


FT 




/label- signal 


FT 


domain 


73., 294 






/label- Flank_LRR_FlanU 






/note- "mediates adhesive events" 




domain 


295., 518 


FT 




/label- Flank-LRR-Flank_2 


FT 




/note- "mediates adhesive events* 


FT 


domain 


519. .714 


FT 




/label- Flank_LRR_Flank_3 


FT 




/note- "mediates adhesive events" 


FT 


domain 


715.. 910 


FT 




/label- Flank_LRR_Flank_4 


FT 




/note- "mediates adhesive events" 


FT 


region 


911.. 1150 


FT 




/label- TandemjGF_like_repeats 


FT 




/note- "involved in protein-protein interactions 


FT 


region 


1353.. 1393 


FT 




/label- 7th„EGF_like_repeat 


FT 




/note- "involved in receptor- Ugand interactions 


FT 


region 


1394.. 1404 


FT 




/label- alternative_splice_segment 


FT 




/note- "developmentally regulated" 


FT 


region 


1405.. 1480 


FT 




/label- C-terminal_region 


PN 


WO9210518-A. 




PD 


25-JUN-1992, 




PF 


27-NOV-1991; U09055. 


PR 


07-DEC-1990; US-624135. 


PA 


(UYYA ) UNIV YALE, 


PI 


Artavanis-Tsakonas S, Rothberg JM; 


DR 


WPI; 92-234590/28. 



DR N-PSDB; Q25811. 

PT SLIT protein and sequence elements for treating 

PT neurodegenerative disease - useful for Alzheimer's 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 1; Page 84-89; 122pp; English. 

CC The SLIT protein is necessary for normal development of the midline 

CC of the CNS, partic. the midline glial cells, and for the 

CC concomitant formation of the commisural axon pathways , The process 

CC is dependent on the level of SLIT protein expression. It appears 

CC that SLIT protein is excreted by the midline glial cells where it 

CC is synthesised and is eventually associated with the surfaces of 

CC axons that traverse than, The SLIT protein is tightly localised to 

CC the muscle attachment sites and to the sites of contact between 

CC adjacent pairs of cardioblasts as they coalesce to form the lumen 

CC of the larval heart. The SLIT protein defines a new set of 

CC molecules (TAGONS) which play a key role in axon outgrowth and 

CC pathfinding. SLIT can be used as a nerve regenerative in 

CC neurodegenerative diseases such as Alzheimer's Disease, spinal cord 

CC injuries, brain injuries, crushed (optic) nerve, amytrophic lateral 

CC sclerosis, diabetes-caused nerve damage, Parkinson's Disease, 

CC strokes, epilepsy, multiple sclerosis, paraplegia retinal 

CC degeneration and facial nerve damage. The 4 "Flank-Leucine-rich 

CC region-Flank" domains, the C-terminal region and part of the 

CC alternative splice segment (i.e. GEGSTEPFTVT) are all individually 

CC claimed as are molecules comprising at least 1 FLank-LRR-Flank domain 

CC and at least 1 EGF-like repeat element from the SLIT protein. 

CC See also R29102. 

SQ Sequence 1480 AA; 

Query Match 12.9%; Score 245; DB 5; Length 1480; 

Best Local Similarity 31.9%; Pred. No. 1.10e-12; 



Matches 


Db 


900 


Qy 


39 


Db 


959 


Qy 


97 


Db 


1014 


Qy 


154 



I : :l : I I: I I I: 



Mill: 



1:1 II I 



: II ::| III 



:lhl: II I I I 
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Release 3,1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 

Jsrch_pp protein - protein database search, using Smith-Waterman algorithm 

lun on: 



Fri May 28 09:40:24 1999; MasPar time 12.37 Seconds 

786.870 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description: 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



XJS-09-191-647-14 

(1-243) from US09191647 .pep 

1905 

1 I LDVAS LRQAPG ENGT SFHG SSFVDEVEKWRCGCARCAS 243 

PAM 150 
Gap 11 

122810 seqs, 40068593 residues 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 



pir60 

l:pirl 2:pir2 3:pir3 4:pir4 



istics: Mean 40.820; Variance 67.040; scale 0.609 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed,, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Query 



NO. 


Score Match Length 


DB ID 


Description p 


red, No, 


1 


292 


15.3 


1429 


2 S06434 


homeotic protein lin- 


l,04e-42 


2 


290 


15.2 


1203 


2 A49175 


Motch B protein - mou 


2.98e-42 


3 


285 


15,0 


2471 


2 A49128 


cell-fate determining 


4.14e-41 


4 


274 


14,4 


2318 


2 S45306 


notch 3 protein - mou 


1.32e-38 


5 


275 


14,4 


2321 


2 S78549 


notch3 protein - huma 


7,83e-39 


6 


269 


14.1 


1220 


2 A56136 


jagged protein precur 


1.79e-37 


7 


264 


13.9 


1469 


2 B36665 


slit protein 2 precur 


2,41e-36 


8 


262 


13.8 


2524 


2 A35844 


Xotch protein - Afric 


6,82e-36 


9 


260 


13.6 


2531 


2 S18188 


notch protein homolog 


1.92e-35 


10 


260 


13.6 


2531 


2 A46019 


gene Notch- 1 protein 


1.92e-35 


11 


254 


13.3 


530 


2 A31640 


epidermal growth fact 


4.27e-34 


12 


254 


13.3 


1480 


2 A36665 


slit protein 1 precur 


4.27e-34 


13 


251 


13.2 


2437 


2 S42612 


transmembrane protein 


2.00e-33 


14 


245 


12.9 


2139 


2 A35672 


crumbs protein • frui 


4.36e-32 


15 


246 


12.9 


2555 


2 A40043 


notch protein homolog ' 


2.61e-32 


16 


244 


12.8 


2703 


2 A24420 


notch protein - fruit 


7.27e-32 


17 


242 


12.7 


570 


2 A48836 


fibropellin C precurs 


2.02e-31 


18 


239 


12.5 


1295 


2 A32901 


glpl protein precurso 


9.35e-31 


19 


234 


12.3 


473 


2 A56175 


adhesive plaque prote 


1.19e-29 


20 


234 


12.3 


832 


2 A31246 


neurogenic protein De 


1.19e-29 


21 


234 


12.3 


880 


2 S00670 


gene Delta protein pr 


1.19e-29 


22 


233 


12.2 


833 


2 S19087 


gene Delta protein pr 


1.98e-29 


23 


230 


12.1 


1064 


2 A40136 


fibropellin la ■ sea 


9.05e-29 



24 


231 12 


.1 4391 


2 


A38096 


perlecan precursor - 


5,45e-29 


25 


219 11 


.5 861 


2 


A48825 


Notch homolog Motch p 


2.30e-26 


26 


216 11 


.3 728 


2 


150719 


C-Delta-1 • chicken 


1.03e-25 


27 


213 11 


.2 387 


2 


B49175 


Motch A protein - mou 


4.61e-25 


28 


213 11 


.2 722 


2 


148324 


DELTA-like 1 • mouse 


4,61e-25 


29 


211 11 


.1 293 


2 


B26637 


neurogenic repetitive 


1.25e-24 


30 


212 11 


.1 5147 


1 


IJFFTM 


cadherin-related tumo 


7,58e-25 


31 


208 10 


.9 3707 


2 


S18252 


heparan sulfate prote 


5.52e-24 


32 


203 10 


.7 200 


2 


A26637 


neurogenic repetitive 


6.52e-23 


33 


203 10 


.7 1404 


2 


A36666 


serrate protein precu 


6.52e-23 


34 


203 10 


.7 1408 


2 


S16148 


gene serrate protein 


6,52e-23 


35 


196 10 


.3 259 


2 


S48713 


fetal antigen 1 - hum 


2.02e-21 


36 


197 10 


.3 383 


2 


B45484 


delta-like dlk homeot 


1.24e-21 


37 


197 10 


.3 383 


2 


S53716 


homeotic protein dlk 


1.24e-21 


38 


' 197 10 


.3 1268 


2 


S52781 


neurocan - mouse 


1.24e-21 


39 


195 10 


.2 1257 


2 


S28764 


neurocan - rat 


3,29e-21 


40 


189 9 


.9 260 


2 


A44549 


fetal antigen 1 homeo 


6.08e-20 


41 


188 9 


.9 2409 


2 


A60979 


versican precursor - 


9.86e-20 


42 


185 9 


.7 1959 


1 


AGRT 


agrin - rat 


4,19e-19 


43 


185 9 


.7 2397 


2 


A55535 


versican precursor - 


4.19e-19 


44 


183 9 


.6 862 


2 


S43922 


versican - pig-tailed 


1.10e-18 


45 


179 9 


.4 385 


2 


S53718 


homeotic protein dlk 


7.44e-18 



RESULT 
ENTRY 
TITLE 
ORGANISM 
DATE 



S06434 itype complete 

homeotic protein lin-12 precursor - Caenorhabditis elegans 
♦ formal jiame Caenorhabditis elegans 
29-Jan-1993 *sequence_revision 29-Jan-1993 Itext change 

12-Sep-1997 
S06434; A24769 
S06434 

♦authors Yochem, J.; Weston, K,; Greenwald, I, 
ijournal Nature (1988) 335:547-550 
ititle The Caenorhabditis elegans lin-12 gene encodes a 

transmembrane protein with overall similarity to D 

Notch, 

♦cross-references MOID: 88334747 
taccession S06434 
♦tmolecule.type DNA 
♦♦residues 1-1429 itlabel YOC 
♦♦cross-references EMBL:M12069; NID:gl56357; PID:gl56358 
INCE A24769 
f authors Greenwald, I. 
♦journal Cell (1985) 43:583-590 
f cross -references MUID: 86079540 
♦accession A24769 
♦♦moleculejype DNA 
♦♦residues 173-712 Mabel GRE 
GENETICS 

♦introns 50/2; 90/1; 109/1; 172/3; 545/1; 589/2; 632/2; 1273/3; 1389/3 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 
glycoprotein; transmembrane protein 



KEYWORDS 

FEATURE 
909-931 
1093-1125 
1206-1238 
1240-1272 

SUMMARY 



♦domain transmembrane ♦status predicted Slabel TMM\ 

♦domain ankyrin repeat homology t label AN1\ 

♦domain ankyrin repeat homology ♦label AN2\ 

♦domain ankyrin repeat homology Uabel AN3 
♦length 1429 taolecular -weight 157114 Ichecksum 4196 



Query Match 15.3%; 
Best Local Similarity 40.5%; 
Matches 47; Conservative 



Score 292; DB 2; Length 1429; 

Pred. No. 1.04e-42; 

24; Mismatches 35; Indels 10; 



333 ICNHGTCIDSPLSEKAFECQCEPGYEGILCEQDKNE-CLSENMCLNNGTCVNLPG-SFRC 390 

:| III : I h :l hll I I 1 1 : 1 |: II; |:|:: III: : : |: I 
54 VCAHGCC-Q-PSSQSGFTCECEEGWMGPLCDQRTNDPCLG-NKCVH-GTCLPINAFSYSC 109 

391 DCARGFGGKWCDEP--L-NMCQDFHCENDGTCMHTSDHSPVCQCKNGFIGKRCEKE 443 

I I II III I I II : I : I I :: I |;|::|| I |::| 
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Qy 110 KCLEGHGGVLCDEEEDLFNPCQMIKCKH-GKCRLSGVGQPYCECNSGFTGDSCDRE 164 



RESULT 2 

entry A49175 #type fragment 

TITLE Motch B protein - mouse (fragment) 

ALTERNATE_NAMES Notch homolog 

ORGANISM tformaljiame Mus musculus tcommonjiame house mouse 

DATE 21-Jan-1994 tsequence.revision 05-Jan-1996 ttext change 

14-Aug-1998 
ACCESSIONS A49175; PH1570; S32113 
REFERENCE A49175 

ftauthors Lardelli, M. ; Lendahl, tl. 

ijournal Exp. Cell Res. (1993) 204:364-372 

ttitle Motch A and Motch B- -two mouse Notch homologu.es coexpressed 

in a wide variety of tissues, 
licross -references MUID: 93178563 
((accession A49175 

tlstatus preliminary; nucleic acid sequence not shown 
timolecule.type mRNA 

•tlresidues 1-1203 tflabel LAR 
tlcross-references EMBL:X68279; NID;g287989; PID:g287990 
ttexperimental.source embryo 

Mnote sequence extracted from NCBI backbone (NCBIP: 126158) 

comment This protein has many EGF repeats and lin-12/Notch repeats. 
COMMENT This protein is one of the neurogenic proteins controlling the 

decision between ectodermaland neural fate for cells in the early 

embryo. 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

560-591 tdomain EGF homology tlabel EGF 

SUMMARY tlength 1203 fchecksum 910 

Query Match 15.24; Score 290; DB 2; Length 1203; 

Best Local Similarity 33.3%; Pred. No. 2.98e-42; 

Matches 43; Conservative 34; Mismatches 41; Indels 11; Gaps 10; 

Db 255 DNCDPDPCHHGOCQDGIDS-YTCICNPGYMGAICSDQIDE-CYSSPCLNDGRCIDLVN-G 311 

' : I I II II : :| :|| |: I ll::| :: :: I :: |:: II:: : 
Qy 48 EPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNRCVH-GTCLPINAFS 106 

Db 312 YQCNCQPGTSGLNC--EIN-FDDCASNPCMHGVC-VDGINR-YSCVCSPGFTGQRCNIDI 366 

I 1:1 I :|: I I : |: I hi I : |: : I I |::||||: I: :| 
Qy 107 YSCKCLEGHGGVLCDEEEDLFNPCQMI KCKHGKCRLSGVGQPY -CECNSGFTGDSCDREI 165 

Db 367 DECASNPCR 375 
I :: U 
166 S-CRGERIR 173 



I 

lESI 



ISULT 3 

ENTRY A49128 ttype complete 

TITLE cell -fate determining gene Notch2 protein - rat 

ORGANISM t formal jiame Rattus norvegicus tcommonjiame Norway rat 

DATE 21-Jan-1994 tsequencejrevision 18-Nov-1994 I text change 

14-Aug-1998 
ACCESSIONS A49128 
REFERENCE A49128 

tauthors Weinmaster, G.; Roberts, V.J.; Lemke, G. 
Ijournal Development (1992) 116:931-941 
ttitle Notch2: a second mammalian Notch gene, 
itcross -references MUID : 93202015 
laccession A49128 

ttstatus preliminary; not compared with conceptual translation 

ttmolecule_type mRNA 

f#residues 1-2471 H label WEI 

t§experimental_source Schwann cell 

iinote sequence extracted from NCBI backbone (NCBIP : 127811 ) 

CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

1029-1060 tdomain EGF homology tlabel EGF\ 



1876-1908 tdomain ankyrin repeat homology tlabel AN1\ 

1909-1941 tdomain ankyrin repeat homology tlabel AN2\ 

1943-1975 tdomain ankyrin repeat homology tlabel AN3\ 

1976-2008 tdomain ankyrin repeat homology tlabel AN4\ 

2009-2041 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2471 tmolecular -weight 265367 tchecksum 5929 

Query Match 15.04; Score 285; DB 2; Length 2471; 

Best Local Similarity 33,34; Pred. No. 4.14e-41; 

Matches 43; Conservative 33; Mismatches 42; indels 11; Gaps 10; 

Db 572 DNCDPDPCHHGQCQDGIDS-YTCICNPGYMGAICSDQIDE-CYSSPCLNDGRCIDLVN-G 628 

: I I II II : :| Ml |: I ||::| :: :: I :: |:: I |: : : 
Qy 48 EPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVH-GTCLPINAFS 106 

Db 629 YQCNCQPGTSGLNC--EIN-FDDCASNPCLHGAC-VDGINR-YSCVCSPGFTGQRCNIDI 683 

I IM I M: I I : I: I I II I : |: : I I hMllh I: M 
Qy 107 YSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHGKCRLSGVGQPY-CECNSGFTGDSCDREI 165 

Db 684 DECASNPCR 692 

I :: I 
Qy 166 S-CRGERIR 173 



ENTRY S45306 ttype complete 

TITLE notch 3 protein - mouse 

ORGANISM tformaljiame Mus musculus tcommonjiame house 'mouse 

DATE 20-Feb-1995 tsequence revision 20-Feb-1995 Stext change 

lO-Jul-1998 
ACCESSIONS S45306 
REFERENCE S45306 

tauthors Lardelli, M.; Dahlstrand, J,; Lendahl, U. 
♦journal Mech. Dev. (1994) 46:123-136 
ttitle The novel Notch homologue mouse Notch 3 lacks specific 
epidermal growth factor-repeats and is expressed in 
proliferating neuroepithelial, 
tcross -references MUID: 95001556 
laccession S45306 
ttstatus preliminary 
ttmolecule_type mRNA 
Itresidues 1-2318 Itlobel LAR 
Itcross -references EMBL:X74760; NID:g483580; PID:g483581 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1839-1871 tdomain ankyrin repeat homology llabel AN1\ 

1872-1904 tdomain ankyrin repeat homology tlabel AN2\ 

1906-1938 tdomain ankyrin repeat homology tlabel AN3\ 

1939-1971 tdomain ankyrin repeat homology tlabel AN4\ 

1972-2004 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2318 tmolecular -weight 244245 Ichecksum 9358 

Query Match 14.44; Score 274; DB 2; Length 2318; 

Best Local Similarity 36.24; Pred. No. 1.32e-38; 



Ml : I II I IM I I I I I I I I I II III 



1:111 IMI I: : II l:M III: III Ml |::|: 



Matches 


Db 


122 


Qy 


48 


Db 


181 


Qy 


105 


Db 


236 


Qy 


163 



RESULT 5 

ENTRY 

TITLE 



S78549 ttype complete 
notcb.3 protein - human 



Tue Jun 1 10:16:07 1999 
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ORGANISM tformaljame Homo sapiens icommonjiame man 

DATE 24-Jul-1998 fsequence_revision 24-Jul-1998 ttext change 

17-Mar-1999 
ACCESSIONS S78549; S71825 
REFERENCE S78549 

((authors Joutel, A.; Tournier-Lasserve, E. 
((submission submitted to the EMBL Data Library, April 1997 
((accession S78549 
ttmolecule.type mRNA 
(ttresidues 1-2321 M label JOU1 
(ttcross-references EMBL:U97669; NID:g2668591; PID;g2668592 
REFERENCE S71825 

•authors Joutel, A.; Corpechot, C; Ducros, A.; vahedi, K. ; Chabriat, 
H.; Mouton, P.; Alamowitch, S.; Domenga, V.; Cecillion, M. ; 
Marechal, E.; Maciazek, J.; Vayssiere, C; Cruaud, C; 
Cabanis, E.A.; Ruchoux, M.H.; Weissenbach, J.; Bach, J.F.; 

•Bousser, M.G.; Tournier-Lasserve, E. 
•journal Nature (1996) 383:707-710 
•title Notch3 mutations in CADASIL, a hereditary adult-onset 
condition causing stroke and dementia, 
•cross-references MUID:97032728 
•accession S71825 

••status nucleic acid sequence not shown 
ttmolecule.type DNA 

ttresidues 67-113 ; 138-194 ; 268-333, 'G' , 335-346; 536-613,-716-765; 

1240-1279; 1815-1888 Mabel JOU2 
••cross-references EMBL:U97669 
GENETICS 

•gene notch3 
•map_position 19pl3.1 
FUNCTION 

•description may be involved in pathogenesis of CADASIL, causing a type of 
stroke and dementia 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 
tandem repeat; transmembrane protein 



KEYWORDS 

FEATURE 
318-349 
1838-1870 
1871-1903 
1905-1937 
1938-1970 
1971-2003 

SUMMARY 



tdomain EGF homology tlabel EGF\ 

•domain ankyrin repeat homology tlabel AN1\ 

•domain ankyrin repeat homology tlabel AN2\ 

tdomain ankyrin repeat homology tlabel AN3\ 

tdomain ankyrin repeat homology tlabel AN4\ 

tdomain ankyrin repeat homology tlabel AN5 

•length 2321 tmolecular -weight 243657 tchecksum 3337 



1 



mry Match 14.4*; 
ist Local Similarity 36,94; 
tches 48; Conservative 



Score 275; DB 2; Length 2321; 
Pred. No. 7 . 83e-39; 

20; Mismatches 50; Indels 12; Gaps 11; 



Db 



121 DPCLSSPCAHGARCSVGPDGRFLCSCPPGYQGRSC-RSDVDECRVGEPCRHGGTCLNTPG 179 
:l! Nil I : I I I I I I : I I :|: I II III : 
Qy 48 EPCHKKVCAHGC-CQPSSQSGFTCECEEGWMGPLCDQRTNDPC-LGNKCVHG-TCLPINA 104 



Db 



180 -SFRCQCPAGYTGPLC-ENPAV-PCAPSPCRNGGTCRQSG-DLTYDCACLPGFEGQNCE 234 

M I I I II I: : II |::| || || Ml 
105 FSYSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPY-CECNSGFTGDSCD 162 



Db 235 VNVDDCPGHR 244 

:: I I I 
Qy 163 REIS-CRGER 171 



RESULT 6 

ENTRY A56136 ttype complete 

TITLE jagged protein precursor - rat 

ORGANISM tformaljame Rattus norvegicus tcommonjiame Norway rat 

DATE 28-Apr-1995 *sequence_revision 28-Apr-1995 ttext change 

ll-Aug-1995 

ACCESSIONS A56136 

REFERENCE A56136 

•authors Lindsell, C.E.; Shawber, C.J.; Boulter, J.; weinmaster, G. 

tjournal Cell (1995) 80:909-917 

ttitle Jagged: a mammalian ligand that activates Notchl. 



tcross-references MUID: 95211842 
taccession A56136 

•tstatus preliminary 

ttmolecule_type mRNA 

••residues 1-1220 ttlabel LIN 

••cross-references GB:L38483 
SUMMARY tlength 1220 tmolecular-weight 134528 tchecksum 2746 



Query Match 14.14; 
Best Local Similarity 37,84; 
45; Conservative 



Score 269; DB 2; Length 1220; 

Pred, No. 1.79e-37; 

19; Mismatches 45; Indels 10; 



598 NVCGPHGKCKSESGGKFTCDCNKGFTGTYCHENIND-CEGNPCTNGGTCID-GVNSYKCI 655 

:lh 111:1: llhl: I I I : 1 1 I 1 1 I : I 1 1 : III 
53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

656 CSDGWEGAHC-ENNI-NDCSQNPCHYGGTCR-DLVNDFYCDCKNGWKGKTCHSRDSQC 710 

I M I I I::: II M II I ||:|::| I :| I : | 
111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG ■ KCRLSGVGQPYCECNSGFTGDSCD - REISC 167 



RESULT 7 
ENTRY 
TITLE 



ORGANISM 
DATE 



ACCESSIONS 



•authors 



tjournal 
•title 



B36665 ttype complete 
slit protein 2 precursor - fruit fly (Drosophila 

melanogaster) 
•formal.name Drosophila melanogaster 
30-Apr-1991 #sequence_revision 30-Apr-1991 ttext change 

16-Dec-1998 
B36665 
A36665 

Rothberg, J.M,; Jacobs, J.R,; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
Genes Dev. (1990) 4:2169-2187 

slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
tcross-references MUID : 91099665 
taccession B36665 

••status preliminary 
ttmolecule_type mRNA 
••residues 1-1469 ••label ROT 
ttcross-references GB:X53959 
GENETICS 

tgene FlyBase:sli 
ttcross-references FlyBase: PBgnO0O3425 
CLASSIFICATION tsuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 

FEATURE 



66-91 


•domain proteoglycan amino-terminal homology tlabel 




PAH1\ 


101-124 


•domain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR1\ 


125-148 


•domain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR2\ 


149-172 


•domain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR3\ 


173-196 


•domain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR4\ 


197-220 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR5\ 


228-272 


•domain proteoglycan carboxyl-terminal homology tlabel 




PCS1\ 


288-313 


•domain proteoglycan amino-terminal homology tlabel 
PAH2\ 


323-346 


•domain leucine-rich alpha - 2 -glycoprotein repeat 




homology tlabel LRR6\ 


347-370 


•domain leucine-rich alpha - 2 -glycoprotein repeat 




homology tlabel LRR7\ 


371-394 


tdomain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR8\ 


395-418 


•domain leucine-rich alpha-2-glycoprotein repeat 




homology tlabel LRR9\ 
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419 


442 


♦domain leucine'rich alpha -2 -glycoprotein repeat 
homology » label LR10\ 


450 


494 


tdomain proteoglycan carboxyl- terminal homology tlabel 
PCS2\ 


512 


537 


idomain proteoglycan amino-terminal homology tlabel 
PAH3\ 


547 


571 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR11\ 


572 


595 


Idomain leucine-rich alpha* 2 -glycoprotein repeat 
homology tlabel LR12\ 


596 


619 


idomain leucine-rich alpha"2"glycoprotein repeat 
homology tlabel LR13\ 


620 


643 


' idomain leucine-rich alpha - 2 _ glycoprotein repeat 
homology tlabel LR14\ 


651 


695 


idomain proteoglycan carboxyl -terminal homology tlabel 
PCS3\ 


708 


733 


idomain proteoglycan amino- terminal homology tlabel 
PAH4\ 


743 


766 


idomain leucine-rich alpha-2-glycoprotein repeat 
homology tlabel LR15\ 


.767 


790 


idomain leucine-rich a lpha - 2 - g lycoprote in repeat 
■ homology tlabel LR16\ 




890 


tdomain proteoglycan carboxyl- terminal homology tlabel 
PCS4\ 


1028-1061 


tdomain EGF homology tlabel EGF 



SUMMARY tlength 1469 imolecular-weight 164695 tchecksum 8361 

Query Match 13.9%; Score 264; DB 2; Length 1469; 

Best Local Similarity 34.14; Pred. No. 2.41e-36; 

Matches 42; Conservative 27; Mismatches 44; indels 10; Gaps 9; 

Db 1351 EEPVDPCLENKCRRGSRCVPNSNARDGYQCKCKHGQRGRYCDQAASTCRKEQVREYy-TE 1409 

I: :M II::: :l I : I I I : I I 1 1 : I 1 1 |::|:|l : 
Qy 124 EDLFNPCQMIKCKHG-KC-RLSGVGQPY-CECNSGFTGDSCDREIS-CRGERIRDYYQKQ 179 

Db 1410 NDCRSRQPLK-YAK--CVGGC-GNQCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRRCGCT 1465 

: I I :: I III I III:: :||| I:: :: ::: I !ll|: 
Oy 180 QGYAACQTTKKVSRLECRGGCAGGQCCGPLRSKRRKYSFECTDGSSFVDEVEKWRCGCA 239 

Db 1466 KKC 1468 

: I 

Qy 240 R-C 241 



RESULT 8 

ENTRY A35844 itype complete 

TITLE Xotch protein - African clawed frog 

ORGANISM iformaljame Xenopus laevis tcommon_name African clawed frog 

12-Oct-1990 *sequence_revision 12-Oct-1990 ttext.change 
14-Aug-1998 
ISSIONS A35844 
1FERENCE A35844 
iauthors Coffman, C; Harris, W,; Kintner, C. 
tjournal Science (1990) 249:1438-1441 
ttitle Xotch, the Xenopus homolog of Drosophila notch, 
tcross -references MUID; 90385285 
taccession A35844 

tistatus preliminary; nucleic acid sequence not shown; not 

compared with conceptual translation 
ttmolecule.type mRNA 
itresidues 1-2524 tilabel COF 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 

repeat homology; EGF homology 
KEYWORDS transmembrane protein 

FEATURE 

222-254 idomain EGF homology tlabel EGF\ 

1924-1956 idomain ankyrin repeat homology tlabel AN1\ 

1957-1989 idomain ankyrin repeat homology ilabel AN2\ 

1991-2023 idomain ankyrin repeat homology ilabel AN3\ 

2024-2056 idomain ankyrin repeat homology ilabel AN4\ 

2057-2089 idomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2524 imolecular-weight 274931 tchecksum 9.441 



Matches 


Db 


714 


Qy 


44 


Db 


772 


Qy 


103 


Db 


826 


Qy 


161 



Query Match 13.84; Score 262; DB 2; Length 2524; 

Best Local Similarity 31.94; Pred. No. 6.82e-36; 



I: : I III: : :|: hi! II h II I: I :| |::| II 



: :| I I I :| I : :: I I Ml ! ; |: I |:| :|| 



RESULT 9 

ENTRY S18188 ttype complete 

title notch protein homolog ■ rat 

ORGANISM tformaljiame Rattus norvegicus tcommonjiame Norway rat 

DATE 19-Feb-1994 tsequence_revision 10-Nov-1995 ttext change 

12-Feb-1999 
ACCESSIONS S18188 
REFERENCE S18188 

iauthors Weinmaster, G.; Roberts, V.J.; Lemke, G. 

ijournal Development (1991) 113:199-205 

ititle A homolog of Drosophila Notch expressed during mammalian 

development, 
icross -references MUID: 92111383 
taccession S18188 
tttmolecule_type mRNA 
itresidues 1-2531 itlabel WEI 
ttcross-references EMBL:X57405; NID:g57634; PID:g57635 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE ' 

1917-1949 tdomain ankyrin repeat homology tlabel AN1\ 

1950-1982 tdomain ankyrin repeat homology ilabel AN2\ 

1984-2016 tdomain ankyrin repeat homology ilabel AN3\ 

2017-2049 tdomain ankyrin repeat homology tlabel AN4\ 

2050-2082 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY tlength 2531 Imolecular-weight 270907 tchecksum 2705 

Query Match 13.64; Score 260; DB 2; Length 2531; 

Best Local Similarity 33,64; Pred. No. 1.92e-35; 

Matches 41; Conservative 27; Mismatches 43; Indels 11; Gaps 10; 

Db 714 LSEVNECNSNPCIHGACR-DGLNGYKCDCAPGWSGTNCDINNNE-CESNPCVNGGTCKDM 771 

I: : I: : I II I: : :|: hi II III hi :| Ihl II : 
Qy 44 LPGCEPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPI 102 

Db 772 TS-GYVCTCREGFSGPNC-QTNI-NECASNPCLNQGTC-IDDVA-GYKCNCPLPYTGAT 825 

: :| I I II :| I : :: I I I ::| I : h I hi :li : 
Qy 103 NAFSYSCKCLEGHGGVLCDEEEDLFNPCQMIKC-KHGKCRLSGVGQPY-CECNSGFTGDS 160 

Db 826 CE 827 
h 

Qy 161 CD 162 



RESULT 10 

ENTRY A46019 ttype complete 

title gene Notch- 1 protein ■ mouse 

ORGANISM fformal_name Mus musculus tcoinonjame house mouse 

DATE 22-Sep-1993 tsequence_revision 18-Nov-1994 itext change 

14-Aug-1998 
ACCESSIONS A46019 
REFERENCE A46019 

♦authors del Amo, F.F.; Gendron-Maguire, M. ; Swiatek, P.J.; Jenkins, 
N.A.; Copeland, N.G.; Gridley, T. 

tjournal Genomics (1993) 15:259-264 

ttitle Cloning, analysis, and chromosomal localization of Notch- 1, a 



Tue Jun 1 10:16:07 1999 



US-09-191-647-14.rpr 



Page 5 



757-788 
1917-1948 
1949-1981 
1983-2015 
2016-2048 
2049-2081 
■MMARY 



Db 



mouse homolog of Drosophila Notch, 
icross-references MUID: 93194170 
taccession A46019 

itstatus preliminary; not compared with conceptual translation 

##molecule_type nucleic acid 

tf residues 1-2531 ftlabel DEL 

ftcross -references GB:Z11886; GB:S47228; NID:g288502; PlD:g288503 
ffnote sequence extracted from NCBI backbone (NCBIP:127318) 

CLASSIFICATION fsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

tdomain EGF homology #label EGF\ 
tdomain ankyrin repeat homology tlabel AN1\ 
idomain ankyrin repeat homology tlabel AN2\ 
idomain ankyrin repeat homology tlabel AN3\ 
idomain ankyrin repeat homology ilabel AN4\ 
idomain ankyrin repeat homology ilabel AN5 
ilength 2531 imolecular-weight 271312 tchecksum 6611 

Query Match 13. 6%; Score 260; DB 2; Length 2531; 

Best Local Similarity 33.6%; Pred. No, 1.92e-35; 

Matches 41; Conservative 27; Mismatches 43; Indels 11; Gaps 10; 

714 LSEVNECNSNPCIHGACR-DGLNGYKCDCAPGWSGTNCDINNNE-CESNPCVNGGTCKDM 771 

I: : I: : I II h : :|: |:| II III |: I :| ||:| II : 
Qy 44 LPGCEPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPI 102 

Db 772 TS-GYVCTCREGFSGPNC--QTNI-NECASNPCLNQGTC-IDDVA-GYKCNCPLPYTGAT 825 

: :| I I II :| I : :: I I I ::| I : |: I |:| :|| : 
Qy 103 NAFSYSCKCLEGHGGVLCDEEEDLFNPCQMIKC - KHGKCRLSGVGQPY -CECNSGFTGDS 160 

Db 826 CE 827 

I: 

Qy 161 CD 162 



RESULT 11 

ENTRY A31640 itype fragment 

TITLE epidermal growth factor-like protein slit - fruit fly 

{Drosophila melanogaster) (fragment) 
ORGANISM tformal_name Drosophila melanogaster 

DATE 28-Feb-1990 tsequence.revision 28-Feb-1990 ttext change 

14-Aug-1998 
ACCESSIONS A31640 
REFERENCE A31640 

• tauthors Rothberg, J.M.; Hartley, D.A.; Walther, Z.; 
Artavanis-Tsakonas, S. 
t journal Cell (1988) 55:1047-1059 

ttitle slit: An EGF-homologous locus of D, melanogaster involved in 

the development of the embryonic central nervous system, 
icross-references MUID: 89077533 
taccession A31640 
ttmolecule.type DNA 
ttresidues 1-530 ttlabel ROT 
itcross-references GB:M23543; NID:g340939; PID:g514357 
GENETICS 

igene FlyBase:sli 

itcross-references FlyBase:FBgnO003425 
iintrons 470/3 
CLASSIFICATION isuperfamily EGF homology 
KEYWORDS growth factor 

FEATURE 

148-181 idomain EGF homology ilabel EGF 

SUMMARY tlength 530 tchecksum 6330 

Query Match 13 .3%; Score 254; DB 2; Length 530; 

Best Local Similarity 32.6%; Pred. No. 4,27e-34; 

Matches 45; Conservative 30; Mismatches 51; Indels 12; Gaps 11; 

Db 20 VRNDILAKCNACFEQPCQNQAQCVALPQREYQCLCQPGYHGKHCEFMI-DACYGNPCRNN 78 

II: |::| I : : I : : I : I I : I I I : MM!: 
Qy 39 MQTGILPGCEPCHKKVC-AHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKC-VH 96 



Db 79 ATCTVLEEGRFSCQCAPGYTGARCETNID--D-CLGEIKCQNNATC-IDGV-ESYKCECQ 133 

:N :ll I I I I: : I : I III ::: I : II ::| III 
Qy 97 GTCLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQ-MIKC - KHGKCRLSGVGQPY "CECN 153 



Db 



134 PGFSGEFCDTKIQFCSPE 151 

:!!:!: II I I I 
154 SGFTGDSCDREIS-CRGE 170 



RESULT 12 

ENTRY A36665 itype complete 

TITLE slit protein 1 precursor - fruit fly (Drosophila 

melanogaster) 

ORGANISM tformaljiame Drosophila melanogaster 

DATE 30-Apr-1991 tsequencejrevision 30-Apr-1991 itext change 

24-Sep-1998 
ACCESSIONS A36665; S13523 
REFERENCE A36665 

tauthors Rothberg, J.M.; Jacobs, J.R.; Goodman, C.S.; 

Artavanis-Tsakonas, S. 
tjournal Genes Dev. (1990) 4:2169-2187 

ititle slit: an extracellular protein necessary for development of 
midline glia and commissural axon pathways contains both 
EGF and LRR domains, 
icross-references MOID: 91099665 
taccession A36665 

itstatus preliminary 
ttmolecule.type mRNA 
ttresidues 1-1480 ttlabel ROT 
itcross-references GB:X53959; NID:g8614; PID:g8615 
GENETICS 

igene FlyBase:sli 

itcross-references FlyBase:FBgn0003425 
CLASSIFICATION isuperfamily proteoglycan amino-terminal homology; EGF 
homology; leucine-rich alpha-2-glycoprotein repeat 
homology; proteoglycan carboxyl-terminal homology 
KEYWORDS alternative splicing 

FEATURE 

66-91 idomain proteoglycan amino-terminal homology tlabel 

PAH1\ 

101-124 idomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR1\ 
125-148 idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR2\ 
149-172 idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR3\ 
173-196 idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR4\ 
197-220 idomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR5\ 
228-272 idomain proteoglycan carboxyl-terminal homology tlabel 

PCS1\ 

288-313 tdomain proteoglycan amino-terminal homology ilabel 

PAH2\ 

323-346 idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR6\ 
347-370 tdomain leucine-rich alpha - 2 - g 1 ycoprote i n repeat 

homology ilabel LRR7\ 
371-394 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LRR8\ 
395-418 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LRR9\ 
419-442 idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LR10\ 
450-494 tdomain proteoglycan carboxyl-terminal homology tlabel 

PCS2\ 

512-537 tdomain proteoglycan amino-terminal homology ilabel 

PAH3\ 

547-571 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology tlabel LR11\ 
572-595 idomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LR12\ 
596-619 tdomain leucine-rich alpha-2-glycoprotein repeat 
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homology llabel LR13\ 
620-643 Idomain leucine-rich alpha-2-glycoprotein repeat 

homology llabel LR14\ 
' 651-695 idomain proteoglycan carboxyl -terminal homology ilabel 

PCS3\ 

708-733 tdomain proteoglycan amino-terminal homology Habel 

PAH4\ 

743-766 tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology ttlabel LR15\ 
767-790 tdomain leucine-rich alpha-2-glycoprotein repeat 

homology ilabel LR16\ 
791-814 tdomain leucine-rich alpha- 2 -glycoprotein repeat 

homology ilabel LR17\ 
815-838 tdomain leucine-rich alpha - 2 -glycoprotein repeat 

homology ilabel LR18\ 
846-890 tdomain proteoglycan carboxyl-terminal homology ilabel 

PCS4\ 

1028-1061 tdomain EGF homology ilabel EGF 

SUMMARY tlength 1480 tmolecular-weight 165751 Checksum 900 

[uery Match 13.3*; Score 254; DB 2; Length 1480; 

!St local Similarity 32.6*; Pred. No. 4.27e-34; 
itches 45; Conservative 30; Mismatches 51; Indels 12; Gaps 11; 

Db 900 VRNDILAKCNACFEQPCQNQAQCVALPQREYQCLCQPGYHGKHCEFMI-DACYGNPCRNN 958 

II: 1-1 I : = I : : I : I I : I I I : |:| II I : 
Qy 39 MQTG ILPGCEPCHKKVC - AHGCCQPSSOSGFTCECEEGWMGPLCDQRTNDPCLGHKC - VH 96 

Db 959 ATCTVLEEGRFSCQCAPGYIGARCETNID--D-CLGEIKCQNNATC-IDGV-ESYKCECQ 1013 

:M " :ll I I I I: : I : I III ::: I : II ::| III 
Oy 97 GTCLP INAFSYSC RCLEGHGGVLCDEESDLFNPCQ - MI KC - KHGKCRLSGVGQPY - CECN 153 

Db 1014 PGFSGEFCDTKIQFCSPE 1031 

:M:|: Mill 
Oy 154 SGFTGDSCDREIS-CRGE 170 



RESULT 13 

ENTRY S42612 I type complete 

TITLE transmembrane protein precursor - zebra fish 

ORGANISM tformaljiame Brachydanio rerio tcommonjiame zebra fish 

DATE 20-Feb-1995 tsequencejrevision 20-Feb-1995 itext change 

lO-Jul-1998 
ACCESSIONS S42612 
REFERENCE S42612 

tauthors Bierkamp, C; Campos -Ortega, J. A, 

tjournal Mech. Dev. (1993) 43:87-100 

ttitle A zebrafish homologue of the Drosophila neurogenic gene Notch 

•and its pattern of transcription during early 
embryogenesis. 
icross -references MUID: 94128602 
iaccession S42612 

itstatus preliminary 
ttmolecule.type mRNA 
itresidues 1-2437 itlabel BIE 
ttcross-references EMBL:X69088; NID:g433866; PID:g433867 
CLASSIFICATION tsuperfamily unassigned ankyrin repeat proteins; ankyrin 
repeat homology 

FEATURE 

1915-1947 tdomain ankyrin repeat homology Ilabel AN1\ 

1948-1980 tdomain ankyrin repeat homology ilabel AN2\ 

1982-2014 tdomain ankyrin repeat homology ilabel AN3\ 

2015-2047 tdomain ankyrin repeat homology ilabel AN4\ 

2048-2080 tdomain ankyrin repeat homology ilabel AN5 

SUMMARY tlength 2437 tmolecular-weight 262306 ichecksum 4021 

Query Match 13.2%; Score 251; DB 2; Length 2437; 

Best Local Similarity 34.4%; Pred. No, 2,00e-33; 

Matches 42; Conservative 23; Mismatches 45; Indels 12; Gaps 11; 

Db 718 CSSNPCIHGSCLDQINS-YRCVCEAGWMGRNCDININE-CLSNPCVNGGTCKDMTS-GYL 774 

I Mill I : I II INI II I: M: ||:| || : : :| 
Qy 50 CHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYS 108 



Db 775 CTCRAGFSGPNC-QMNI-NECASNPCLNQGSC-IDDVA-GFKCNCMLPYTGEVCENVLA 829 

I :l i ::| I : I: : 1:1 :||: |: :: 

Qy 109 CKCLEGHGGVLCDEEEDLFNPCQMIKC-KHGKCRLSGVGQPY-CECNSGFTGDSCDREIS 166 

Db 830 PC 831 

I 

Qy 167 -C 167 



RESULT 14 

ENTRY A35672 ttype complete 

TITLE crumbs protein - fruit fly (Drosophila melanogaster) 

ORGANISM tformaljiame Drosophila melanogaster 

DATE 21-Sep-1990 tsequence_revision 18-Nov-1992 itext change 

14-Aug-1998 
ACCESSIONS A35672 
REFERENCE A35672 

tauthors Tepass, U,; Theres, C; Knust, E. 

t journal Cell (1990) 61:787-799 

•title crumbs encodes an EGF-like protein expressed on apical 

membranes of Drosophila epithelial cells and required for 
organization of epithelia. 
tcross-references MUID: 90263104 
iaccession A35672 

itstatus preliminary 
itmoleculejype mRNA 
itresidues 1-2139 ttlabel TEP 
itcross- references GB:M33753 

iinote the authors translated the codon GGC for residue 1928 as 

Cys, and TAT for residue 2023 as Gin 

GENETICS 

igene FlyBase:crb 

iicross-references FlyBase : FBgn0000368 
CLASSIFICATION tsuperfamily EGF homology 
KEYWORDS transmembrane protein 

FEATURE 

691-722 idomain EGF homology ilabel EGF 

SUMMARY tlength 2139 tmolecular-weight 233619 ichecksum 7230 

Query Match 12.9*; Score 245; DB 2; Length 2139; 

Best Local Similarity 31.7*; Pred. No. 4 . 36e-32; 

Matches 40; Conservative 23; Mismatches 57; Indels 6; Gaps 6; 

Db 271 CLNDPCMGHGTC-SSSPEGYECRCTARYSGKNCQKDNGSPCAKNPCENGGSCLENSEGNY 329 

: I :|: i :M I: I I |: II I I :| :|| : :| 

Qy 50 CHKKVC-AHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSY 107 

Db 330 QCFCDPNHSGQHCETEVNIHPLCQTNPCLNNGACWIGGSGALTCECPKGYAGARCEVDT 389 

I I hi I: I :: II I ::| I ::| I III |::| |: : 
Qy 108 SCKCLEGHGGVLCDEEEDLFNPCQMIKC-KHGKCR-LSGVGQPYCECNSGFTGDSCDREI 165 



Db 390 DECASQ 395 

I :: 

Qy 166 S-CRGE 170 



RESULT 15 

ENTRY A40043 ttype complete 

TITLE notch protein homolog TAN-1 precursor - human 

ORGANISM tformaljiame Homo sapiens tcommon.name man 

DATE 21-Apr-1992 tsequence_revision 21-Apr-1992 Itext change 

14-Aug-1998 
ACCESSIONS A40043 
REFERENCE A40043 

tauthors Ellisen, L.W.; Bird, J.; West, D.C.; Soreng, A.L, ; Reynolds 
T.C.; Smith, S.D.; Sklar, J. 

ijournal Cell (1991) 66:649-661 

ititle TAN-1, the human homolog of the Drosophila Notch gene, is 
broken by chromosomal translocations in T lymphoblastic 



icross-references MUID; 91347367 
iaccession A40043 
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Mstatus preliminary; nucleic acid sequence not shown; not 

compared with conceptual translation 
ftmolecule_type mRKA 
ttresidues 1-2555 itlabel ELL 
f&cross -references GB:M73980 
CLASSIFICATION #superf amily unassigned ankyrin repeat proteins; ankyrin 
repeat homology; EGF homology 

FEATURE 

1149-1180 tdomain EGF homology I label EGF\ 

1927-1959 tdomain ankyrin repeat homology t label AH1\ 

1960-1992 fdomain ankyrin repeat homology flabel AN2\ 

1994-2026 idomain ankyrin repeat homology tlabel AN3\ 

2027-2059 fdomain ankyrin repeat homology tlabel AN4\ 

2060-2092 tdomain ankyrin repeat homology tlabel AN5 

SUMMARY flength 2555 tmolecular -weight 272337 tchecksum 463 



A? 



Query Match 12,9%; 
"lest Local Similarity 33.6*; 
41; Conservative 



Score 246; DB 2; Length 2555; 

Pred. No. 2.61e-32; 

26; Mismatches 44; Indels 11; 



Db 713 LSEVNECNSNPCVHGACR-DSLNGYKCDCDPGWSGTNCDINNNE-CESNPCVNGGTCKDM 770 

h : h : I II I: I :|: |:|: II I II |: I :| l|:| II : 
Qy 44 LPGCEPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPI 102 



Db 771 TS-GIVCTCREGFSGPNC--QTNI-NECASNPCLNKGTC-IDDVA-GYKCNCLLPYTGAT 824 

: : I I II :| I : :: II I : I I : |: | |:| :|| : 
Qy 103 NAFSYSCKCLEGHGGVLCDEEEDLFNPCQMIKC- KHGKCRLSGVGQPY -CECNSGFTGDS 160 

Db 825 CE 826 

i: 

Qy 161 CD 162 



Search completed: Fri May 28 09; 41; 04 1999 
Job time ; 40 sees. 



t 



Tue Jun 1 10:16:08 1999 



US-09-191-1 



647-14 . rsp 



Page 1 



1 \ // 1 1 _ 1 1 _l 


1 _ 1 1 _l 


1 1 1 


IW/I 1 1 1 1 1 1 1 


1 1 1 1 1 1 


1 1 1 


1 VV 1 1 1 l_l 1 1 l_ 


1 U 1 II 

1 -III 


1 l_l 


1 1 1 1 1 — ~l 1 


WW II 


i n 


1 1 1 1 1 _l 1 
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Release 3.1A John F. Collins, Biocomputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, O.K. 
Distribution rights by Oxford Molecular Ltd 

irch_pp protein • protein database search, using Smith-Waterman algorithm 

Run on: Fri Hay 28 09:41:22 1999; MasPar time 8.63 Seconds 

795.986 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description : 
Perfect Score: 
Sequence: 

Scoring table: 
Searched: 



>tJS-09-191-647-14 

(1-243) from US09191647 . pep 

1905 

1 ILDVASLRQAPGENGTSFHG SSFVDEVEKWKCGCARCAS 243 

PAM 150 
Gap 11 

77977 seqs, 28268293 residues 



Post-processing: Minimum Match 0* 

Listing first 45 summaries 



Swiss -prot37 
l:swissprot 



istics: Mean 41.939; Variance 61.445; scale 0.683 

Pred, No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



Query 



SUMMARIES 



NO. 


Score Match Length D 


i ID 


Description 


Pred. 


Jo. 


RX 
RA 
RA 


1 


292 


15.3 


1429 


LI12.CAEEL 


LIH-12 PROTEIN PRECURS 


9.66e 


48 


2 


274 


14.4 


2318 


NTC3J0USE 


NEUROGENIC LOCUS NOTCH 


3.65e 


43 


RA 


3 


262 


13.8 


2524 


NOTCJENLA 


NEUROGENIC LOCUS NOTCH 


3.85e 


40 


RA 


4 


260 


13.6 


2531 


NTC1J0USE 


NEUROGENIC LOCUS NOTCH 


1.22e 


39 


RA 


5 


260 


13.6 


2531 


NTClJAT 


NEUROGENIC LOCUS NOTCH 


l,22e 


39 


RA 


6 


254 


13.3 


1480 


SLITJROME 


SLIT PROTEIN PRECURSOR 


3.86e 


38 


RA 


7 


251 


13.2 


2437 


NOTCJRARE 


NEUROGENIC LOCUS NOTCH 


2.16e 


37 


RA 


8 


245 


12.9 


2139 


CRB.DROME 


CRUMBS PROTEIN PRECURS 


6.68e 


36 


RA 


9 


246 


12.9 


2444 


NTC1JUMAN 


NEUROGENIC LOCUS NOTCH 


3.78e 


36 


RA 


10 


244 


12,8 


2703 


NOTC.DROME 


NEUROGENIC LOCUS NOTCH 


1.18e 


35 


RA 


11 


242 


12.7 


570 


FBP3JTRPU 


FIBROPELLIN C PRECURSO 


3.69e 


35 


RT 


12 


241 


12.7 


1964 


NTC4.MQUSE 


NEUROGENIC LOCUS NOTCH 


6.52e 


35 


RT 


13 


239 


12.5 


1295 


GLP1.CAEEL 


GLP-1 PROTEIN PRECURSO 


2.03e 


34 


RL 


14 


234 


12,3 


880 


DL.DR0ME 


NEUROGENIC LOCUS DELTA 


3.45e 


33 


CC 


15 


230 


12.1 


1064 


FBPl.STRPU 


FIBROPELLIN I PRECURSO 


3.30e 


32 


CC 


16 


231 


12,1 


4393 


PGBMJOMAN 


BASEMENT MEMBRANE-SPEC 


1.88e 


32 


CC 


17 


215 


11.3 


714 


DLL1.RAT 


DELTA- LIKE PROTEIN 1 P 


1.45e 


28 


CC 


18 


213 


11.2 


722 


DLL1JI0USE 


DELTA- LIKE PROTEIN 1 P 


4.41e 


28 


CC 


19 


211 


11.1 


723 


DLLlJUMAN 


DELTA- LIKE PROTEIN 1 P 


1.33e 


27 


CC 


20 


212 


11.1 


5147 


FAT_DR0ME 


CADHERIN-RELATED TUMOR 


7.67e 


28 


CC 


21 


208 


10.9 


3707 


PGBMJOUSE 


BASEMENT MEMBRANE-SPEC 


6.98e 


27 


CC 


22 


203 


10.7 


1408 


SERR.DROME 


SERRATE PROTEIN PRECUR 


1.09e 


25 


CC 


23 


197 


10.3 


383 


DLK.HUMAN 


DELTA-LIKE PROTEIN PRE 


2.87e 


24 


CC 



24 


197 


10.3 


1268 










25 


195 


10,2 


1257 


PGCN RAT 


WPriRiratJ rnpp dratftn 


3.51e 


24 


26 


188 


9.9 


3396 


ucrv urrw&n 
r\A,V_HUMAN 


WTDOTPRM fADI? DDAtftfTM 

VtiKslLAN OUKt FKUILIH 


3.72e 


22 


27 


185 


9.7 


1959 


1I7PT OUT 


rtdtm ddewtdcad 




1} 


28 


185 


9.7 


3358 




VfcKalLAW LUKL rKUltiN 


1.86e 


21 


29 


183 


9.6 


862 


ocrv vac hit 


VPPCTHAM PADP DDATPTW 
VEiKSlLftH UJKL rKVlLiW 






30 


179 


9.4 


385 


DLKJIOUSE 


nFI,?A-TJKF" PBOTFTN PPF 


4.52e 


20 


31 


177 


9,3 


1394 


TGFB HUMAN 


LATENT TRANSFORMING GR 


l,30e 


19 


32 


177 


9,3 


1712 


TfiPR RAT 




1 30e 


19 


33 


178 


9,3 


2871 


FBN1 BOVIN 


FIBRILLIN 1 PRECURSOR 


7,68e 


20 


34 


177 


9,3 


2871 


FBN1_HUMAN 


FIBRILLIN 1 PRECURSOR. 


1.30e 


19 


35 


176 


9,2 


1955 


AGRI CHICK 


AGRIN PRECURSOR. 


2.21e 


19 


36 


175 


9.2 


3562 


PGCVICHICK 


VERSICAN CORE PROTEIN 


3!74e 


19 


37 


172 


9,0 


2911 


FBN2JUMAN 


FIBRILLIN 2 PRECURSOR. 


1.81e 


18 


38 


170 


8.9 


2871 


FBN1J0USE 


FIBRILLIN 1 PRECURSOR. 


5,14e 


18 


39 


166 


8.7 


379 


CYR6J10USE 


CYR61 PROTEIN PRECURSO 


4. lie 


17 


40 


165 


8.7 


458 


' PRTCJABIT 


VITAMIN-K DEPENDENT PR 


6.90e 


17 


41 


162 


8.5 


375 


CE10.CHICK 


CEF-10 PROTEIN PRECURS 


3.24e 


16 


42 


161 


8.5 


2907 


FBN2JIOUSE 


FIBRILLIN 2 PRECURSOR. 


5.41e 


16 


43 


156 


8,2 


1328 


AGRI.DISOM 


AGRIN (FRAGMENT). 


6.95e 


15 


44 


153 


8.0 


381 


CYR6JUMAN 


CYR61 PROTEIN PRECURSO 


3.17e 


14 


45 


151 


7.9 


515 


APX1.CAEEL 


APX-1 PROTEIN PRECURSO 


8.67e 


14 



PRT; 1429 AA, 



RESULT 1 

ID LI12 CAEEL STANDARD; 
AC P14585; 
DT 01-JAN-1990 (REL. 13, CREATED) 
DT 01-JAN-1990 (REL, 13, LAST SEQUENCE UPDATE) 
DT 01-OCT-1996 (REL. 34, LAST ANNOTATION UPDATE) 
DE LIN- 12 PROTEIN PRECURSOR. 
GN LIN-12 OR R107.8. 
OS CAENORHABDITIS ELEGANS . 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 
OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 
RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 88334747. 

RA YOCHEM J , , WESTON K , , GREENWALD I . ; 

RT "The Caenorhabditis elegans lin-12 gene encodes a transmembrane 

RT protein with overall similarity to Drosophila Notch,"; 

RL NATURE 335:547-550(1988). 

RN [2] 

RP SEQUENCE FROM N.A. 
STRAIN-BRISTOL N2; 
MEDLINE; 94150718. 

WILSON R., AINSCOUGH R., ANDERSON K. ( BAYNES C, BERKS M., 
BONFIELD J., BURTON J., CONNELL M,, COPSEY T., COOPER J., COULSON A., 
CRAXTON M., DEAR S., DU Z., DURBIN R., FAVELLO A., FRASER A., 
FULTON L., GARDNER A., GREEN P., HAWKINS T., HILLIER L, , JIER M. , 
JOHNSTON L., JONES M, , KERSHAW J., KIRSTEN J., LAISSTER N. ( 
LATREILLE P., LIGHTNING J,, LLOYD C, MORTIMORE B., O'CALLAGHAN M., 
PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D. , SHOWNKEEN R., 
SIMS M, ( SMALDON N, , SMITH A., SMITH M., SONNHAMMER E., STADEN R., 



WOHLDMAN P.; 

"2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 



•!• FUNCTION: LIN-12 IS IS INVOLVED IN SEVERAL CELL FATES DECISIONS 
THAT REQUIRES CELL-CELL INTERACTIONS. IT IS POSSIBLE THAT LIN-12 
ENCODES A MEMBRANE -BOUND RECEPTOR FOR A SIGNAL THAT ENABLES 
EXPRESSION OF THE VENTRAL UTERINE PRECURSOR CELL FATE, 

-I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

-!- SIMILARITY: HIGH, TO C. ELEGANS GLP-1. 

•!• SIMILARITY: CONTAINS 13 EGF'LIKE DOMAINS. 

-!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

•!• SIMILARITY: CONTAINS 6 ANK REPEATS. 
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Page 2 



CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license<?isb-sib.ch) , 

CC 

DR EMBL; M12069; G156358; -. 

DR EMBL; Z14092; E1348691; -. 

DR PIR; S06434; S06434. 

DR WORMPEP; R107.8; CE00274. 

DR PROSITE; PS00010; ASXJIYDROXYL; 3. 

DR PROSITE; PS00022; EGF 1; 12. 

DR PROSITE; PS01186; EGF J; 11. 

DR PROSITE; PS01187; EGF.CA; 2. 

DR PFAM; PF00008; EGF; 13. 

DR PFAM; PF00023; ank; 4. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

A GLYCOPROTEIN; SIGNAL, 



■ 


SIGNAL 


I 


15 


DfYPPWT'TBT 
rUlCiNHAJj. 




CHAIN 


16 


1429 


rTKJ.11 DDATPTM 

bin n rnulLlN. 


FT 


DOMAIN 


16 


908 


CiAlKH^CiLibUiinn (rUl&NlIftbJ , 


FT 


TRANSMEM 


909 


931 


rUlLlilinb, 


FT 


DOMAIN 


932 


1429 


PVPfiDT ICMTf f DfYPPMTiTftT \ 
1 1 1 Ur LAoMH, ( fU 1 tN 1 I AL ) . 


FT 


DOMAIN 


24 


618 


IJ A tAjt lift KartlAlB. 


FT 


DOMAIN 


631 


750 


1 Y TTN /MATCH OPDP1TC 




DOMAIN 


1046 


1266 


fi Y IMIf MATTE 1 DPDPATO 


FT 


DOMAIN 


20 


61 






DOMAIN 


114 


150 


mjf LlM!i £ . 


FT 


DOMAIN 


152 


190 


Ski LlKfc J, (,AIA,IUM BINDING (POTENTIAL). 


FT 


DOMAIN 


201 


246 


DUt Llf\L 4 . 


FT 


DOMAIN 


250 


285 


tut L1M, 3, 




DOMAIN 


287 


323 


lbs LlKh 0 . 


FT 


DOMAIN 


323 


363 


WF-TTIfF 7 


FT 


DOMAIN 


365 


402 


UjI LIKE, 0/ GALLIUM BINDING (POTENTIAL). 


FT 


DOMAIN 


404 


441 


PHP-rTIfP Q 
Mjf LIMj J . 


FT • 




449 




Mjf LIKt IU. 


FT 


DOMAIN 


503 


541 


tiur LlMi II, 


FT 


DOMAIN 


543 


579 


EGF-LIKE 12. 


FT 


DOMAIN 


582 


619 


EGF-LIKE 13. 


FT 


REPEAT 


635 


669 


LIN/NOTCH 1, 


FT 


REPEAT 


670 


710 


LIN/NOTCH 2. 


FT 


REPEAT 


711 


750 


LIN/NOTCH 3. 


FT 


REPEAT 


1046 


1078 


ANK MOTIF 1. 


FT 


REPEAT 


1079 


1119 


ANK MOTIF 2. 


FT 


REPEAT 


1120 


1152 


ANK MOTIF 3. 




REPEAT 


1153 


1188 


ANK MOTIF 4. 




REPEAT 


1189 


1232 


ANK MOTIF 5. 




REPEAT 


1233 


1266 


ANK MOTIF 6, 


FT 


DISULFID 


24 


35 


BY SIMILARITY, 


FT 


DISULFID 


29 


49 


BY SIMILARITY, 


FT 


DISULFID 


51 


60 


BY SIMILARITY, 


FT 


DISULFID 


118 


129 


BY SIMILARITY, 


FT 


DISULFID 


123 


138 


BY SIMILARITY. 


FT 


DISULFID 


140 


149 


BY SIMILARITY, 


FT 


DISULFID 


156 


169 


BY SIMILARITY. 


FT 


DISULFID 


163 


178 


BY SIMILARITY , 


FT 


DISULFID 


180 


189 


BY SIMILARITY. 


FT 


DISULFID 


205 


227 


BY SIMILARITY, 


FT 


DISULFID 


221 


234 


BY SIMILARITY. 


FT 


DISULFID 


236 


245 


BY SIMILARITY, 


FT 


DISULFID 


254 


264 


BY SIMILARITY, 


FT 


DISULFID 


259 


273 


BY SIMILARITY. 


FT 


DISULFID 


275 


284 


BY SIMILARITY. 


FT 


DISULFID 


291 


302 


BY SIMILARITY. 


FT 


DISULFID 


296 


311 


BY SIMILARITY. 


FT 


DISULFID 


313 


322 


BY SIMILARITY. 


FT 


DISULFID 


327 


339 


BY SIMILARITY. 


FT 


DISULFID 


334 


351 


BY SIMILARITY. 


FT 


DISULFID 


353 


362 


BY SIMILARITY. 





DISULFID 


369 


381 


T3V CTMTT BBTTV 

di DlMILAKIil. 


FT 




375 


390 


di blMILAKIli. 


FT 


DISULFID 


392 


401 


UV CTMTT &DTTV 
DI OIHIUUUII. 


FT 


DISULFID 


408 


419 


RY CTMTT &UTTY 


FT 


DISULFID 


413 


429 


RY CJMTT.ARTTY 
DI dirUljnJMl 1 1 


FT 


DISULFID 


431 


440 


P.V CTMTT iTJTTV 




DISULFID 


507 


518 


BY CTMTT BUTTY 
DI OiMIliUUil . 


FT 


DISULFID 


512 


529 


RY CTMTT 1BTTV 
DI OiPULiflJUXI . 


FT 


DISULFID 


531 


540 


RV CTMTT BT5TTV 
DI DiWILAiUII. 


FT 


DISULFID 


547 


558 


RY CTMTT ABTTY 
DI aiJULnlUI I . 


FT 


DISULFID 


552 


567 


HV CTMTT fiEtTTV 
DI OlfllbnKil 1 , 


FT 


UXJUbE IU 


569 


578 


RY CTMTT BDTTV 
DI Olnlunnll I , 




LUDUJjr 111 


586 


597 


DV CTMTT RTJTTV 

di oIMIbAKIll. 


FT 


nTCnTPTn 


591 


607 


DV CTMTT BDTfPV 

Hi blMILAKIIi, 


FT 


nTcriTPTn 

uiouJjr J.U 


609 


618 


TSV CTMTT RDTT"V 

di bIMILAKIli. 


FT 


CARBOHYD 


41 


41 


POTENTIAL. 


FT 


CARBOHYD 


165 


165 


POTENTIAL, 


FT 


CARBOHYD 


194 


194 


POTENTIAL, 


FT 


CARBOHYD 


378 


378 


POTENTIAL, 


FT 


CARBOHYD 


515 


515 


POTENTIAL. 


FT 


CARBOHYD 


623 


623 


POTENTIAL. 


FT' 


CARBOHYD 


751 


751 


POTENTIAL. 


FT 


CARBOHYD 


754 


754 


POTENTIAL. 


FT 


CARBOHYD 


900 


900 


POTENTIAL. 


SQ 


SEQUENCE 


1429 


AA; 157115 MW; CFD2CCA4 CRC32; 



Query Match 15.3*; Score 292; DB 1; Length 1429; 

Best Local Similarity 40.5%; Pred. No, 9.66e-48; 

Matches 47; Conservative 24; Mismatches 35; Indels 10; Gaps 9; 

Db 333 ICNHGTCIDSPLSEKAFECQCEPGYEGILCEQDKNE-CLSENMCLNNGTCVNLPG-SFRC 390 

:| II I : I I: :| Ml I I IN |: II: |:|:: 111: : : |: I 
Qy 54 VCAHGCC -Q - PSSQ SGFTCECEEGWMGPLCDQRTNDPCLG - NKCVH - GTCLP I NAFSYSC 109 

Db 391 DCARGFGGKWCDEP-L-NMCQDFHCENDGTCMHTSDHSPVCQCKNGFIGKRCEKE 443 

I I II III I I II : I : I I :: I |:|::|| I |::| 
Qy 110 KCLEGHGGVLCDEEEDLFNPCQMIKCKH-GKCRLSGVGQPYCECNSGFTGDSCDRE 164 



RESULT 2 

ID NTC3J10USE STANDARD; PRT; 2318 AA. 

AC Q61982; 

DT 01-NOV-1997 (REL. 35, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH 3 PROTEIN. 

GN NOTCH3. 

OS MUS MUSCULUS (MOUSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 
OC 1 RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-ICR X SWISS WEBSTER; 

RX MEDLINE; 95001556. 

RA LARDELLI M. ( DALSTRAND J,, LENDAHL U.; . 

RT "The novel Notch homologue mouse Notch 3 lacks specific epidermal 

RT growth factor -repeats and is expressed in proliferating 

RT neuroepithelium."; 

RL MECH. DEV. 46:123-136(1994), 

CC •!• FUNCTION: NOTCH 1, 2 AND 3 PLAY A COMBINATIONAL ROLE DURING 
CC VARIOUS CELL FATE DECISIONS AND MORPHOLOGICAL MOVEMENTS IN THE 
CC DEVELOPING CNS AND PROBABLY OTHER REGIONS OF THE EMBRYO, 

CC ■!- TISSUE SPECIFICITY: PROLIFERATING NEUROEPITHELIUM, 

CC -!- DEVELOPMENTAL STAGE: CNS DEVELOPMENT. 

CC -!- SIMILARITY: CONTAINS 34 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 
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cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


FT 


DISULFID 


147 


156 


BY SIMILARITY, 


cc 
cc 


or send an email to license@isb-sib.ch), 


FT 
FT 


DISULFID 
DISULFID 


163 
169 


175 
184 


BY SIMILARITY. 












BY SIMILARITY. 


DR 


EMBL; X74760; G483581; - 






FT 


DISULFID 


186 


195 


BY SIMILARITY. 


DR 


MGD; MGI: 99460; NOTCH3. 






FT 


DISULFID 


202 


213 


BY SIMILARITY. 


DR 


PROSITB; PS00010; 


ASXJYDROXYL; 18. 


FT 


DISULFID 


207 


223 


BY SIMILARITY, 


DR 


PROSITB; PS00022; 


EGF_1 ; 


33. 




FT 


DISULFID 


225 


234 


BY SIMILARITY, 


DR 


PROSITE; PS01186; 


EGF.2; 


27. 




FT 


DISULFID 


241 


252 


BY SIMILARITY, 


DR 


PROSITB; PS01187; EGF.CA; 17. 


FT 


DISULFID 


246 


261 


BY SIMILARITY. 


DR 


PFAM; PFO 


0008; EGF; 33. 






FT 


DISULFID 


263 


272 


BY SIMILARITY. 


DR 


PFAM; PF00023; ank; 6. 






FT 


DISULFID 


279 


292 


BY SIMILARITY. 


DR 


PFAM; PF00066; notch; 3. 






FT 


DISULFID 


286 


301 


BY SIMILARITY, 


DR 


HSSP; P00740; 1IXA. 






FT 


DISULFID 


303 


312 


BY SIMILARITY, 


KW 


DIFFERENTIATION; 


NEUROGE 




S; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 


FT 


DISULFID 


319 


330 


BY SIMILARITY, 


KW 


GLYCOPROTEIN. 








FT 


DISULFID 


324 


339 


BY SIMILARITY, 


FT 


DOMAIN 


1 


1643 




EXTRACELLULAR, 


FT 


DISULFID 


341 


350 


BY SIMILARITY. 




TRANSMEM 


1644 


1664 




POTENTIAL. 


FT 


DISULFID 


356 


367 


BY SIMILARITY. 


w 


DOMAIN 


1665 


2318 




CYTOPLASMIC ■ 


FT 


DISULFID 


361 


378 


BY SIMILARITY, 




DOMAIN 


39 


1374 




34 X EGF-TYPE REPEATS. 


FT 


DISULFID 


380 


389 


BY SIMILARITY, 


FT 


DOMAIN 


1388 


1503 




3 X LIN/NOTCH REPEATS. 


FT 


DISULFID 


396 


409 
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: remainder of annotations omitted. 



Query Match 14,4%; 
Best Local Similarity 36.2%; 
Matches 47; Conservative 



Score 274; DB 1; Length 2318; 
Pred, No. 3,65e-43; 

21; Mismatches 50; Indels 12; Gaps 11; 



Db 122 DPCVSRPCVHGAPCSVGPDGRFACACPPGYQGQSC-QSDIDECRSGTTCRHGGTCLNTPG 180 

:H : I II I :::: hi I I I I I I I I I II III : 
Qy 48 EPCHKKVCAHGC-CQPSSQSGFTCECEEGWMGPLCDQRTNDPCL-GHKCVHG-TCLPINA 104 

Db 181 -SFRCQCPLGYTGLLC--ENPW-PCAPSPCRNGGTCRQSS-DVTYDCACLPGFEGQNCE 235 

hill 1:11 I: : II |::| III: III :|| |::|: 
Qy 105 FSYSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHG -KCRLSGVGQPY -CECNSGFTGDSCD 162 



Db 



236 VNVDDCPGHR 245 

:: III 
163 REIS-CRGER 171 



RESULT 3 
ID 
AC 
DT 
DT 
DT 



STANDARD; 



PRT; 2524 AA, 



NOTCJENLA 
P21783; 

01-MAY-1991 (REL, 18, CREATED) 
01-OCM996 (REL. 34 , LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG PRECURSOR (XOTCH PROTEIN) . 
XOTCH, 

XENOPUS LAEVIS (AFRICAN CLAWED FROG). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; AMPHIBIA; BATRACHIA; ANURA; 

MESOBATRACHIA; PIPOIDEA; PIPIDAE; XENOPODINAE; XENOPUS, 

[1] 

SEQUENCE FROM N.A, 
MEDLINE; 90385285. 
COFFMAN C, HARRIS W., KINTNER C; 
"Xotch, the Xenopus horaolog of Drosophila notch,"; 
SCIENCE 249:1438-1441(1990). 
[2] 

REVISIONS TO 1759-1782. 
KINTNER C; 

SUBMITTED (JUN- 19 96) TO EMBL/GENBANK/DDBJ DATA BANKS. 
-I- SUBCELLULAR LOCATION; TYPE I MEMBRANE PROTEIN. 
-!- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS, 
-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 
-!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS, 
-!- SIMILARITY; CONTAINS 3 LIN/NOTCH REPEATS. 
-!• SIMILARITY: CONTAINS 6 ANK REPEATS, 



RP 
RX 
RA 
RT 
RL 
RN 
RP 
RA 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 
CC the European Bioinformatics Institute. There are no restrictions on its 
CC use by non-profit institutions as long as its content is in no way 



CC 


modified t 


nd this statement 


is not removed. Usage by and for commercial 




entities requires a license 


agreement (See http;//www.isb-sib.ch/announce/ 


CC 
CC 


or send an email to license@isb-sib.ch). 




EMBL; M33874; G1364263; -. 






DR 


PIR; A35844; A35844. 








PROSITE; PS00010; 


ASXJYDROXYL; 23. 




DR 


PROSITE; PS00022; EGF 1; 34. 






DR 


PROSITE; PS01186; EGF J; 29. 






PS 


PROSITE; PS01187; 


EGF.CA; 21. 




DR 


PFAM; PF0 


008; EGF; 36. 






DR 


PFAM; PF00023; ank; 6, 






DR 


PFAM; PF00066; notch; 3. 






DR 


HSSP; P00740; 1IXA. 






KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
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SIMILARITY, 


FT 


DISULFID 


1270 


1283 


BY 


SIMILARITY. 


FT 


DISULFID 


1275 


1292 


BY 


SIMILARITY. 


FT 


DISULFID 


1294 


1303 


BY 


SIMILARITY. 


FT 


DISULFID 


1310 


1321 


BY 


SIMILARITY. 


FT 


DISULFID 


1315 


1333 


BY 


SIMILARITY . 


FT 


DISULFID 


1335 


1344 


BY 


SIMILARITY. 


FT 


DISULFID 


1351 


1362 


BY 


SIMILARITY. 


FT 


DISULFID 


1356 


1371 


BY 


SIMILARITY. 


FT 


DISULFID 


1373 


1382 


BY 


SIMILARITY. 


FT 


DISULFID 


1390 


1401 


BY 


SIMILARITY. 


FT 


DISULFID 


1395 


1412 


BY 


SIMILARITY , 


FT 


DISULFID 


1414 


1423 


BY 


SIMILARITY. 


FT 


CARBOHYD 


462 


462 


POTENTIAL. 


FT 


CARBOHYD 


887 


887 


POTENTIAL. 


Not 


j: remainder of annotations omitted. 



Query Match 13.8%; Score 262; DB 1; Length 2524; 

Best Local Similarity 31,9%; Pred. No. 3.85e-40; 

43; Conservative 31; Mismatches 49; Indels 12; 



: : I II I: : :|: hi! II |: || |: 



Matches 


Db 


713 


Qy 


44 


Db 


771 


Qy 


103 


Db 


825 


Qy 


161 



I :|| I : h I hi :|| 



RESULT 4 

ID NTC1JJOUSE STANDARD; PRT; 2531 AA. 

AC Q01705; 

DT Ol-NOV-1995 (REL. 32, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR (MOTCH PROTEIN) . 

GN NOTCH1 OR MOTCH. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; META&OA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 93194170. 

RA FRANCO DEL AMO F,, GENDRON-MAGUIRE M,, SWIATEK P.J., JENKINS N.A., 

RA COPELAND N.G., GRIDLEY T.; 

RT "Cloning, analysis, and chromosomal localization of Botch-1, a mouse 

RT homolog of Drosophila Notch."; 

RL GENOMICS 15:259-264(1993). 

RN [2] 

RP SEQUENCE OF 1551-2170 FROM N.A. 

RC TISSUE=EMBRYO; 

RX MEDLINE; 93048835. 

RA FRANCO DEL AMO F,, SMITH D.E., SWIATEK P.J., GENDRON-MAGUIRE M,, 

RA GREENSPAN R.J,, MCMAHON A. P. , GRIDLEY T . ; 

RT "Expression pattern of Motch, a mouse homolog of Drosophila Notch, 

RT suggests an important role in early postimplantation mouse 

RT development."; 

RL DEVELOPMENT 115:737-744(1992). 
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!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

I- DEVELOPMENTAL STAGE: EXPRESSED ALMOST UNIFORMLY IN EARLY EMBRYOS. 

SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

SIMILARITY: CONTAINS 6 ANK REPEATS, 

SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 



CC 
CC 
CC 
CC 
CC 
CC 
CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; Z11886; G288503; -, 

DR MGD; MGI: 97363; NOTCH1. 

DR PROSITE; PS00010; ASXJYDROXYL; 22, 

DR PROSITE; PS00022; EGF.l; 34. 

DR PROSITE; PS01186; EGF 2; 27. 

DR PROSITE; PS01187; EGF CA; 21. 

•PFAM; PF00008; EGF; 35. 
PFAM; PF00023; ank; 6. 
PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 



FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 


FT 


DOMAIN 


19 


1725 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


1726 


1746 


POTENTIAL, 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


24 


1425 


36 X EGF-TYPE REPEATS. 


FT 


DOMAIN 


1449 


1462 


CYS-RICH. 


FT 


DOMAIN 


1445 


1562 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1445 


1480 


LIN/NOTCH 1. 


FT 


REPEAT 


1481 


1522 


LIN/NOTCH 2. 


FT 


REPEAT 


1523 


1562 


LIN/NOTCH 3. 


FT 


DOMAIN 


1865 


2075 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1865 


1910 


ANK MOTIF 1, 


FT 


■REPEAT 


1912 


1942 


ANK MOTIF 2. 


FT 


REPEAT 


1944 


1975 


ANK MOTIF 3. 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4. 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


FT 


REPEAT 


2044 


2075 


ANK MOTIF 6 , 


FT 


CARBOHYD 


888 


888 


POTENTIAL, 


FT 


CARBOHYD 


959 


959 


POTENTIAL. 


FT 


CARBOHYD 


1179 


1179 


POTENTIAL. 


FT 


CARBOHYD 


1241 


1241 


POTENTIAL, 




CARBOHYD 


1489 


1489 


POTENTIAL. 


1 


CARBOHYD 


1587 


1587 


POTENTIAL. 




SEQUENCE 


2531 


AA; 271312 MW; AD71189B CRC32; 


Query Match 




13.6%; 


Score 260; DB 1; Length 2531; 



Best Local Similarity 33 .6%; Pred. No. 1.22e-39; 
Matches 41; Conservative 27; Mismatches 43; Indels 11; Gaps 10; 

Db 714 LSEVNECNSNPCIHGACR-DGLNGYKCDCAPGWSGTNCDINNNE-CESNPCVNGGTCKDM 771 

I: : I: : I II I: : :|: hi II III hi :| Ihl II : 
Qy 44 LPGCEPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPI 102 

Db 772 TS-GYVCTCREGFSGPNC-QTNI-NECASNPCLNQGTC-IDDVA-GYKCNCPLPYTGAT 825 

: :l I I II :| I : :: I I I ::| I : |: I h| :|| : 
Qy 103 NAFSYSCKCLEGHGGVLCDEEEDLFNPCQMIKC-KHGKCRLSGVGQPY-CECNSGFTGDS 160 

Db 826 CE 827 

I: 

Qy 161 CD 162 



RESULT 5 
ID NTC1 RAT 
AC Q07008; 



DT 01-NOV-1995 (REL. 32, CREATED) 

DT 01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1 PRECURSOR. 

GN NOTCH1. 

OS RATTUS NORVEGICUS (RAT) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-SCHWANN CELL; 

RX MEDLINE; 92111383. 

RA WEINMASTER G., ROBERTS V.J., LEMKE G,; 

RT "A homolog of Drosophila Notch expressed during mammalian 

RT development."; 

RL DEVELOPMENT 113:199-205(1991). 

CC -I- FUNCTION: REQUIRED FOR THE CORRECT DIFFERENTIATION OF A NUMBER 
CC OF TISSUES. 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -I- DEVELOPMENTAL STAGE: IN THE EMBRYO, HIGHEST LEVELS OCCUR BETWEEN 
CC DAYS 12 AND 14 AND DECREASE RAPIDLY TO MUCH LOWER LEVELS IN THE 
CC ADULT. 

CC -!- SIMILARITY; HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 

CC -I- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS, 

CC -I- SIMILARITY; CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute, There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseSisb-sib.ch). 

cc 

DR EMBL; X57405; G57635; -. 

DR PROSITE; PS00010; ASX HYDROXYL; 22, 

DR PROSITE; PS0OQ22; EGF J.; 35. 

DR PROSITE; PS01186; EGF J; 26, 

DR PROSITE; PS01187; EGF_CA; 21. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


18 


POTENTIAL. 


FT 


CHAIN 


19 


2531 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 1. 


FT 


DOMAIN 


19 


1723 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


1724 


1746 


POTENTIAL, 


FT 


DOMAIN 


1747 


2531 


CYTOPLASMIC (POTENTIAL). 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


139 


. EGF-LIKE 3. 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6. 


FT 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10. 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) , 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


829 


867 


EGF-LIKE 22. 
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FT 


DOMAIN 


869 


905 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


907 


943 


EGF-LIKE 24. 


FT 


DOMAIN 


945 


981 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


983 


1019 


EGF-LIKE 26. 


FT 


DOMAIN 


1021 


1057 


■ EGF-LIKE 27, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1059 


1095 


EGF-LIKE 28. 


FT 


DOMAIN 


1097 


1143 


EGF-LIKE 29. 


FT 


DOMAIN 


1145 


1181 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1183 


1219 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1221 


1265 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1267 


1305 


EGF-LIKE 33. 


FT 


DOMAIN 


1307 


1346 


EGF-LIKE 34. 


FT 


DOMAIN 


1348 


1384 


EGF-LIKE 35. 


FT 


DOMAIN 


1387 


1426 


EGF-LIKE 36. 


FT 


DOMAIN 


1449 


1462 


CVS -RICH, 


FT 


DOMAIN 


1865 


2076 


6 X ANK MOTIF REPEATS, 


■ 


REPEAT 


1865 


1910 


ANK MOTIF 1. 


1 


REPEAT 


1912 


1942 


ANK MOTIF 2. 


m 


REPEAT 


1944 


1975 


ANK MOTIF 3. 


FT 


REPEAT 


1978 


2009 


ANK MOTIF 4. 


FT 


REPEAT 


2011 


2042 


ANK MOTIF 5. 


FT 


REPEAT 


2044 


2076 


ANK MOTIF 6, 


FT 


DISULFID 


24 


37 


BY SIMILARITY, 


FT 


DISULFID 


31 


46 


BY SIMILARITY. 


FT 


DISULFID 


48 


57 


BY SIMILARITY, 


FT 


DISULFID 


63 


74 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY SIMILARITY. 


FT 


DISULFID 


89 


98 


BY SIMILARITY, 


FT 


DISULFID 


106 


117 


BY SIMILARITY. 


FT 


DISULFID 


111 


127 


BY SIMILARITY, 


FT 


DISULFID 


129 


138 


BY SIMILARITY. 


FT 


DISULFID 


144 


155 


BY SIMILARITY. 


FT 


DISULFID 


149 


164 


BY SIMILARITY. 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 


FT 


DISULFID 


182 


195 


BY SIMILARITY, 


FT 


DISULFID 


189 


204 


BY SIMILARITY, 


FT 


DISULFID 


206 


215 


BY SIMILARITY. 


FT 


DISULFID 


222 


233 


BY SIMILARITY. 


FT 


DISULFID 


227 


243 


BY SIMILARITY. 


FT 


DISULFID 


245 


254 


BY SIMILARITY, 


FT 


DISULFID 


261 


272 


BY SIMILARITY. 


FT 


DISULFID 


266 


281 


BY SIMILARITY. 


FT 


DISULFID 


283 


292 


BY SIMILARITY. 


FT 


DISULFID 


299 


312 


BY SIMILARITY, 




DISULFID 


306 


321 


BY SIMILARITY, 




DISULFID 


323 


332 . 


BY SIMILARITY. 


VI 


DISULFID 


339 


350 


BY SIMILARITY, 


FT 


DISULFID 


344 


359 


BY SIMILARITY, 


FT 


DISULFID 


361 


370 


BY SIMILARITY. 


FT 


DISULFID 


376 


387 


BY SIMILARITY. 


FT 


DISULFID 


381 


398 


BY SIMILARITY. 


FT 


DISULFID 


400 


409 


BY SIMILARITY. 


FT 


DISULFID 


416 


429 


BY SIMILARITY. 


FT 


DISULFID 


423 


438 


BY SIMILARITY, 


FT 


DISULFID 


440 


449 


BY SIMILARITY. 


FT 


DISULFID 


456 


467 


BY SIMILARITY. 


FT 


DISULFID 


461 


476 


BY SIMILARITY. 


FT 


DISULFID 


478 


487 


BY SIMILARITY, 


FT 


DISULFID 


494 


505 


BY SIMILARITY, 


FT 


DISULFID 


499 


514 


BY SIMILARITY, 


FT 


DISULFID 


516 


525 


BY SIMILARITY. 


FT 


DISULFID 


532 


543 


BY SIMILARITY, 


FT 


DISULFID 


537 


552 


BY SIMILARITY, 


FT 


DISULFID 


554 


563 


BY SIMILARITY, 


FT 


DISULFID 


570 


580 


BY SIMILARITY. 


FT 


DISULFID 


575 


589 


BY SIMILARITY, 


FT 


DISULFID 


591 


600 


BY SIMILARITY, 


FT 


DISULFID 


607 


618 


BY SIMILARITY, 


FT 


DISULFID 


612 


627 


BY SIMILARITY. 


FT 


DISULFID 


629 


638 


BY SIMILARITY. 


FT 


DISULFID 


645 


655 


BY SIMILARITY. 


FT 


DISULFID 


650 


664 


BY SIMILARITY. 


FT 


DISULFID 


666 


675 


BY SIMILARITY. 



FT 


DISULFID 


682 


693 


BY 


SIMILARITY. 


FT 


DISULFID 


687 


702 


BY 


SIMILARITY. 


FT 


DISULFID 


704 


713 


BY 


SIMILARITY, 


FT 


DISULFID 


720 


730 


BY 


SIMILARITY, 


FT 


DISULFID 


725 


739 


BY 


SIMILARITY. 


FT 


DISULFID 


741 


750 


BY 


SIMILARITY, 


FT 


DISULFID 


757 


768 


BY 


SIMILARITY. 


FT 


DISULFID 


762 


777 


BY 


SIMILARITY. 


FT 


DISULFID 


779 


788 


BY 


SIMILARITY. 


FT 


DISULFID 


795 


806 


BY 


SIMILARITY. 


FT 


DISULFID 


800 


815 


BY 


SIMILARITY. 


FT 


DISULFID 


817 


826 


BY 


SIMILARITY. 


FT 


DISULFID 


833 


844 


BY 


SIMILARITY. 


FT 


DISULFID 


838 


855 


BY 


SIMILARITY. 


FT 


DISULFID 


857 


866 


BY 


SIMILARITY, 


FT 


DISULFID 


873 


884 


BY 


SIMILARITY. 


FT 


DISULFID 


878 


893 


BY 


SIMILARITY. 


FT 


DISULFID 


895 


904 


BY 


SIMILARITY. 


FT 


DISULFID 


911 


922 


BY 


SIMILARITY, 


FT 


DISULFID 


916 


931 


BY 


SIMILARITY, 


FT 


DISULFID 


933 


942 


BY 


SIMILARITY, 


FT 


DISULFID 


987 


998 


BY 


SIMILARITY, 


FT 


DISULFID 


992 


1007 


BY 


SIMILARITY. 


FT 


DISULFID 


1009 


1018 


BY 


SIMILARITY. 


FT 


DISULFID 


1025 


1036 


BY 


SIMILARITY, 


FT 


DISULFID 


1030 


1045 


BY 


SIMILARITY. 


FT 


DISULFID 


1047 


1056 


BY 


SIMILARITY, 


FT 


DISULFID 


1063 


1074 


BY 


SIMILARITY. 


FT 


DISULFID 


1068 


1083 


BY 


SIMILARITY, 


FT 


DISULFID 


1085 


1094 


BY 


SIMILARITY. 


FT 


DISULFID. 


1101 


1122 


BY 


SIMILARITY. 


FT 


DISULFID 


1116 


1131 


BY 


SIMILARITY. 


FT 


DISULFID 


1133 


1142 


BY 


SIMILARITY. 


FT 


DISULFID 


1149 


1160 


BY 


SIMILARITY , 


FT 


DISULFID 


1154 


1169 


BY 


SIMILARITY, 


FT 


DISULFID 


1171 


1180 


BY 


SIMILARITY. 


FT 


DISULFID 


1187 


1198 


BY 


SIMILARITY. 


FT 


DISULFID 


1192 


1207 


BY 


SIMILARITY. 


FT 


DISULFID 


1209 


1218 


BY 


SIMILARITY. 


FT 


DISULFID 


1225 


1244 


BY 


SIMILARITY. 


FT 


DISULFID 


1238 


1253 


BY 


SIMILARITY. 


FT 


DISULFID 


1255 


1264 


BY 


SIMILARITY. 


FT 


DISULFID 


1271 


1284 


BY 


SIMILARITY, 


FT 


DISULFID 


1276 


1293 


BY 


SIMILARITY, 


FT 


DISULFID 


1295 


1304 


BY 


SIMILARITY, 


FT 


DISULFID 


1311 


1322 


BY 


SIMILARITY. 


FT 


DISULFID 


1316 


1334 


BY 


SIMILARITY. 


FT 


DISULFID 


1336 


1345 


BY 


SIMILARITY. 


FT 


DISULFID 


1352 


1363 


BY 


SIMILARITY, 


FT 


DISULFID 


1357 


1372 


BY 


SIMILARITY. 


FT 


DISULFID 


1374 


1383 


BY 


SIMILARITY. 


FT 


DISULFID 


1391 


1403 


BY 


SIMILARITY. 



Note: remainder of annotations omitted. 

Query Match 13.6%; Score 260; DB 1; Length 2531; 

Best Local Similarity 33.6%; Pred, No. l,22e-39; 

Matches 41; Conservative 27; Mismatches 43; Indels 11; Gaps 10; 

Db 714 LSEVNECNSNPCIHGACR-DGLNGYKCDCAPGWSGTNCDINNNE-CESNPCVNGGTCKDM 771 

I: : I: : I II I: : :|: hi II III hi :| Ihl II : 
Qy 44 LPGCEPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPI 102 

Db 772 TS -GYVCTCREGFSGPNC - -QTNI -NECASNPCLNQGTC - IDDVA - GYKCNC PLP YTGAT 825 

: :| I I II :| I : :: I I :;| I : |; I 1:1 :| ; 
Qy 103 NAFS YSCKCLEGHGGVLCDEEEDLFNPCQMIKC - KHGKCRLSGVGQPY - CECNSGFTGDS 160 

Db' 826 CE 827 

I: 

Qy 161 CD 162 



RESULT 6 
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ID SLIT_DROME STANDARD; PRT; 1480 M. 

AC P24014 ; 

DT 01-MAR-1992 (REL. 21, CREATED) 

DT 01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 

DT 01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 

DE SLIT PROTEIN PRECURSOR. 

GN SLI. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY) . 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA. 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE; 91099665. 

RA ROTHBERG J.M., JACOBS J.R., GOODMAN C.S., ARTAVANIS'TSAKONAS S.; 

RT "Slit: an extracellular protein necessary for development of midline 

RT glia and commissural axon pathways contains both EGF and LRR 

RT domains."; 

RL GENES DEV. 4:2169-2187(1990) . 

CC ■!• FUNCTION: NECESSARY FOR DEVELOPMENT OF MIDLINE GLIA AND 

CC COMMISSURAL AXON PATHWAYS. SLIT MAY INTERACT WITH EXTRACELLULAR 

•MATRIX MOLECULES. 
■!■ TISSUE SPECIFICITY: EXCRETED BY THE MIDLINE GLIA CELLS AND 
EVENTUALLY DISTRIBUTED ALONG THE AXONS. 

CC -!• ALTERNATIVE PRODUCTS: GIVES RISE TO 2 DISTINCT PROTEINS DIFFERING 
CC BY 11 AA AT THE C'TERMINUS OF THE LAST EGF REPEAT. 

CC ■!- SIMILARITY: CONTAINS 7 EGF-LIKE DOMAINS, 

CC ■!• SIMILARITY: THE REPEATED LEUCINE-RICH (LRR) SEGMENT IS FOUND IN 
CC MANY PROTEINS. NUMBER IN THIS PROTEIN; 22, TWO BLOCK OF 6 LRR'S 
CC AND TWO BLOCKS OF 5 LRR'S. 

CC -!- SIMILARITY: CONTAINS A C- TERMINAL CYSTINE KNOT-LIKE DOMAIN (CTCK). 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch). 

cc 

DR EMBL; X53959; G8615; -. 

DR PIR; A36665; A36665. 

DR FLYBASE; FBgn0003425; sli. 

DR PROSITE; PS00010; ASXJYDROXYL; 3. 

DR PROSITE; PS00022; EGF_1; 7. 

DR PROSITE; PS01185; CTCKJ; 1, 

DR PROSITE; PS01186; EGF.2; 5. 

DR PROSITE; PS01187; EGF CA; 2. 

DR PROSITE; PS01225; CTCK 2; 1, 

•PFAM; PF00007; Cysjcnot; 1. 
PFAM; PF00008; EGF; 7, 
PFAM; PF00054; laminin.G; 1. 

DR PFAM; PF00560; LRR; 10. 

DR HSSP; P00740; 1IXA. 

KW NEUROGENESIS; GLYCOPROTEIN; SIGNAL; ALTERNATIVE SPLICING; 

KW EGF-LIKE DOMAIN; REPEAT; LEUCINE-REPEAT; DUPLICATION. 
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SLIT PROTEIN. 
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CONSERVED N- FLANKING REGION OF THE LRR, 
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LEUCINE-RICH REPEATS (1ST REGION) . 
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CONSERVED C-FLANKING REGION OF THE LRR. 
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CONSERVED N- FLANKING REGION OF THE LRR. 
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LEUCINE-RICH REPEATS (2ND REGION) . 


FT 


DOMAIN 


453 


518 


CONSERVED C-FLANKING REGION OF THE LRR. 
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CONSERVED N- FLANKING REGION OF THE LRR. 
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LEUCINE-RICH REPEATS (3RD REGION). 
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CONSERVED C-FLANKING REGION OF THE LRR. 
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CONSERVED N-FLANKING REGION OF THE LRR. 
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LEUCINE-RICH REPEATS (4TH REGION) . 
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CONSERVED C-FLANKING REGION OF THE LRR. 


FT 


REPEAT 


105 


115 


LRR 1-1. 
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LRR 1-2. 
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LRR 1-3. 
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LRR 1-4, 
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LRR 1-5, 


FT 


REPEAT 


212 


230 


LRR 1-6. 
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LRR 2-1, 
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LRR 2-2. 
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LRR 2-3. 
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LRR 2-4. 
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LRR 2-5. 
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LRR 2-6. 
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LRR 3-1. 
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LRR 3-2. 
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LRR 3-3. 
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LRR 3-4. 
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LRR 3-5. 
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LRR 4-1. 
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LRR 4-2. 
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LRR 4-4. 
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LRR 4-5, 
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EGF-LIKE 1. 
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EGF-LIKE 2. 
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EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1024 


1062 


EGF-LIKE 4. 
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EGF-LIKE 5, CALCIUM- BINDING (POTENTIAL). 
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EGF-LIKE 6. 


FT 


DOMAIN 


1353 


1392 


EGF-LIKE 7. 
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CTCK. 
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POTENTIAL. 
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BY SIMILARITY. 
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SQ 


SEQUENCE 


1480 AA; 165752 MW; 2CD1C421 CRC32; 



Query Match 13.3%; Score 254; DB 1; Length 1480; 

Best Local Similarity 32.6*; Pred. No. 3.86e-38; 

Matches 45; Conservative 30; Mismatches 51; mdels 12; Gaps 11; 
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Db 900 VRNDILAKCNACFEQPCQNQAQCVALPQREYQCLCQPGYHGKHCEFMI-DACYGNPCRNN 958 

II: l"l I : : I : :| : I I : I I | : |:| II | : 
Qy 39 MQTGILPGCEPCHKKVC - AHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKC - VH 96 

Db 959 ATCTVLEEGRFSCQCAPGYTGARCETNID--D-CLGEIKCQNNATC-IDGV-ESYKCECQ 1013 

ill " :H I I I I: : I : I III ::: I : II ::| III 
Qy 97 GTCLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQ-MIKC-KHGKCRLSGVGQPY-CECN 153 



Db 



1014 PGFSGEFCDTKIQFCSPE 1031 

:||:|: II I I I 
154 SGFTGDSCDREIS-CRGE 170 



STANDARD; PRT; 2437 AA. 



RESULT 7 

ID NOTCJ3RARE 

AC P46530; 

•01-NOV-1995 (REL. 32, CREATED) 
01-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 
15-JCL-1998 (REL. 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN PRECURSOR, 

GN NOTCH. 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EOKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACT INOPTERYGII ; NEOPTERYGII ; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-EMBRYO; 

RX MEDLINE; 94128602. 

RA BIERKAMP C . , CAMPOS -ORTEGA J. A,; 

RT "A zebrafish homologue of the Drosophila neurogenic gene Notch and 

RT its pattern of transcription during early embryogenesis."; 

RL MECH. DEV. 43:87-100(1993). 

CC -!- FUNCTION; IMPLICATED IN CELL FATE SPECIFICATIONS DURING 
CC EMBRYO DEVELOPMENT . MAY BE INVOLVED IN THE FORMATION OF THE 
CC NEURAL PLATE, NOTOCHORD AND BRAIN VESICLES. 

CC ■!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC ■!- DEVELOPMENTAL STAGE: EXPRESSED IN ALL CELLS IN PREGASTRULATION 
CC STAGES, DURING GASTRULATION IS DIFFERENTIALLY EXPRESSED, 
CC ACCUMULATING PREDOMINANTLY IN THE PRECHORDAL MESODERM AND 
CC NOTOCHORD. AT THE END OF GASTRULATION, EXPRESSED ALONG THE 
CC ANTERIOR-POSTERIOR AXIS INCLUDING THE DEVELOPING NEURAL PLATE 
CC AND DIFFERENTIATING MESODERM. ALSO PRESENT IN THE DEVELOPING 
CC BRAIN AND HEAD REGIONS. 

•-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 
-!- SIMILARITY: CONTAINS 36 EGF-LIKE DOMAINS. 
-!- SIMILARITY; CONTAINS 3 LIN/NOTCH REPEATS. 

CC -!- SIMILARITY; CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS -PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch), 

CC 

DR EMBL; X69088; G433867 ; -. 

DR PROSITE; PS00010; ASX.HYDROXYL; 23. 

DR PROSITE; PS00022; EGF_1 ; 34. 

DR PROSITE; PS01186; EGF 2; 28. 

DR PROSITE; PS01187; EGF_CA; 22. 

DR PFAM; PF00008; EGF; 36. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA, 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 

KW TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 

FT SIGNAL 1 20 POTENTIAL. 

FT CHAIN 21 2437 NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN. 

FT DOMAIN 21 1724 EXTRACELLULAR (POTENTIAL). 

FT TRANSMEM 1725 1747 POTENTIAL. 

FT DOMAIN 1748 2437 CYTOPLASMIC (POTENTIAL), 
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EGF-LIKE 1. 
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EGF-LIKE 2. 
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EGF-LIKE 3. 
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EGF-LIKE 4. 
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EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 6. 
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EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


334 


370 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 10. 
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» EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL), 
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EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL), 
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EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL), 
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EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 22, 
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EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 28. 
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EGF-LIKE 29. 
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EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 
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EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL), 
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EGF-LIKE 33. 
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EGF-LIKE 34. 
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EGF-LIKE 35. 
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EGF-LIKE 36, 
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3 X LIN/NOTCH REPEATS. 
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LIN/NOTCH 1. 
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LIN/NOTCH 2. 
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LIN/NOTCH 3. 
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Note; remainder of annotations omitted. 



Query Match 13.21; Score 251; DB 1; Length 2437; 

Best Local Similarity 34.4%; Pred. No. 2.16e-37; 



Matches 42; Conservative 23; Mismatches 45; Indels 12; Gaps 11; 

Db 718 CSSNPCIHGSCLDQINS-YRCVCEAGWMGRNCDININE-CLSNPCVNGGTCKDMTS-GYL 774 

I : I II I I : I II Mil II I: Ihl Ihl II : : :| 
Qy 50 CHKKVCAKGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYS 108 

Db 775 CTCRAGFSGPNC--QMNI-NECASNPCLNQGSC-IDDVA-GFKCNCMLPYTGEVCEPLA 829 

I I I :| I : :: I I I ::| I : |: : |:| :||: |: :: 
Qy 109 CKCLEGHGGVLCDEEEDLFNPCQMIKC -KHGKCRLSGVGQPY -CECNSGFTGDSCDREIS 166 

Db 830 PC 831 

I 

Qy 167 -C 167 



RESULT 8 

ID CRBJROME STANDARD; PRT; 2139 AA. 

AC P10040; 

DT 01-MAR-1989 (REL. 10, CREATED) 

DT 01-MAY-1991 (REL, 18, LAST SEQUENCE UPDATE) 

DT 15-DEC-1998 (REL. 37, LAST ANNOTATION UPDATE) 

DE CRUMBS PROTEIN PRECURSOR (95F). 

GN CRB, 

OS DROSOPHILA MELANOGASTER (FRUIT FLY). 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEA! A; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-OREGON-R; TISSUE-EMBRYO; 

RX MEDLINE; 90263104. 

RA TEPASS O., THERES C, KNUST E,; 

RT "Crumbs encodes an EGF-like protein expressed on apical membranes of 

RT Drosophila epithelial cells and required for organization of 

RT epithelia . " ; 

RL CELL 61:787-799(1990). 

RN [2] 

RP SEQUENCE OF 1663-1955 FROM N.A. 

RX MEDLINE; 87218537. 

RA KNUST E., DIETRICH U., TEPASS U., BREMER K.A., WEIGEL D., 

RA. VAESSIN H„ CAMPOS-ORTEGA J.A.; 

RT "EGF homologous sequences encoded in the genome of Drosophila 

RT melanogaster, and their relation to neurogenic genes,"; 

RL EMBO J, 6:761-766(1987). 

CC -I- FUNCTION: MAY PLAY A ROLE IN THE DEVELOPMENT OF EPITHELIA, 

CC POSSIBLY FOR THE ESTABLISHMENT AND/OR MAINTENANCE OF CELL 

CC POLARITY, IT MAY ACT AS A SIGNAL. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -!- PTM; PHOSPHORYLATED IN THE CYTOPLASMIC DOMAIN (POTENTIAL), 

CC -!- SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS , 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

cc 

DR EMBL; M33753; G552087; ALT.SEQ. 

DR EMBL; X05144; E1746; -. 

DR EMBL; X05144; G929536; -. 

DR PIR; B26637; B26637. 

DR PIR; A35672; A35672. 

DR FLYBASE; FBgn0000368; crb. 

DR PROSITE; PS00010; ASX HYDROXYL; 15. 

DR PROSITE; PS00022; EGF J.; 26. 

DR PROSITE; PS01186; EGF.2; 17. 

DR PROSITE; PS01187; EGF_CA; 15, 

DR PFAM; PF00008; EGF; 26. 

DR PFAM; PF00054; laminin.G; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 
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267 
306 
348 
388 
427 
464 
501 
545 
582 
609 
648 
687 
725 
763 
802 
840 
904 
942 



303 
343 
386 
425 
463 
500 
532 
581 
611 
646 
685 
723 
761 



KW GLYCOPROTEIN; SIGNAL; PHOSPHORYLATION. 

FT SIGNAL 1 90 

FT CHAIN 91 2139 

FT DOMAIN 91 2084 

FT TRANSMEM 2085 2111 

FT DOMAIN 2112 2139 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 

•DOMAIN 
DOMAIN 
DOMAIN 

FT DOMAIN 

FT DOMAIN 

FT DOMAIN 840 902 

FT DOMAIN 904 940 

FT DOMAIN 942 978 

FT DOMAIN 980 1021 

FT DOMAIN 1207 1243 

FT DOMAIN 1481 1517 

FT DOMAIN 1759 1795 

FT DOMAIN 1797 1833 

FT DOMAIN 1835 1871 

FT DOMAIN 1874 1915 

FT DOMAIN 1915 1951 

FT DOMAIN 1953 1989 

FT DOMAIN 1991 2029 

FT DOMAIN 2030 2070 

FT DISULFID 271 282 

FT DISDLFID 

FT DISULFID 

FT DISULFID 

FT DISDLFID 

FT DISDLFID 

FT DISULFID 

FT DISDLFID 

FT DISULFID 

•DISDLFID 
DISULFID 
DISULFID 

FT DISDLFID 

FT DISDLFID 

FT DISDLFID 

FT DISDLFID 

FT DISDLFID 

FT DISDLFID 

FT DISDLFID 

FT DISDLFID 

FT DISDLFID 

FT DISDLFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISDLFID 

FT DISDLFID 

FT DISULFID 

FT DISDLFID 

FT DISDLFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 

FT DISULFID 



276 
293 
310 
315 
333 
352 
357 
376 
392 
397 
414 
431 
436 
453 
468 
473 
490 
505 
509 
522 
549 
556 
571 
586 
591 
604 
613 
618 
636 
652 
659 
675 
691 
696 
713 
729 
734 



291 
302 
321 
331 
342 
363 
374 
385 
403 
412 
424 
442 
451 
462 
479 
488 
499 
515 
520 
531 
562 
569 
580 
597 
602 
610 
624 
634 
645 
664 
673 
684 
702 
711 
722 
740 
749 



CRUMBS PROTEIN, 
EXTRACELLULAR (POTENTIAL). 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL). 
EGF-LIKE 1. 
EGF-LIKE 2. 
EGF-LIKE 3. 

EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 
EGF-LIKE 5. 
EGF-LIKE 6. 
EGF-LIKE 7. 
EGF-LIKE 8. 
EGF-LIKE 9. 
EGF-LIKE 10, 



EGF-LIKE 11 
EGF-LIKE 12, 
EGF-LIKE 13 
EGF-LIKE 14 
EGF-LIKE 15 
EGF-LIKE 16 
EGF-LIKE 17 
EGF-LIKE 18 
EGF-LIKE 19, 
EGF-LIKE 20, 
EGF-LIKE 21 
EGF-LIKE 22 
EGF-LIKE 23 
EGF-LIKE 24, 
EGF-LIKE 25 
EGF-LIKE 26 
EGF-LIKE 27, 
EGF-LIKE 28, 
EGF-LIKE 29 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY. 
BY SIMILARITY, 
BY SIMILARITY, 
BY SIMILARITY, 



CALCIUM-BINDING (POTENTIAL) . 

CALCIUM-BINDING (POTENTIAL). 

CALCIUM-BINDING (POTENTIAL) . 

CALCIUM-BINDING (POTENTIAL) , 

CALCIUM-BINDING (POTENTIAL), 

CALCIUM-BINDING (POTENTIAL), 

CALCIUM-BINDING (POTENTIAL) , 



CALCIUM-BINDING (POTENTIAL). 
CALCIUM-BINDING (POTENTIAL), 



CALCIUM-BINDING (POTENTIAL), 
CALCIUM-BINDING (POTENTIAL). 



FT 


DISULFID 


751 


760 


BY SIMILARITY, 


FT 


DISULFID 


767 


778 


BY SIMILARITY. 


FT 


DISULFID 


772 


787 


BY SIMILARITY. 


FT 


DISULFID 


789 


799 


BY SIMILARITY. 


FT 


DISULFID 


806 


817 


BY SIMILARITY. 


FT 


DISULFID 


811 


826 


BY SIMILARITY, 


FT 


DISDLFID 


828 


837 


BY SIMILARITY. 


FT 


DISDLFID 


844 


855 


BY SIMILARITY, 


FT 


DISULFID 


849 


890 


BY SIMILARITY. 


FT 


DISDLFID 


892 


901 


BY SIMILARITY, 


FT 


DISDLFID 


908 


919 


BY SIMILARITY, 


FT 


DISDLFID 


913 


928 


BY SIMILARITY, 


FT 


DISULFID 


930 


939 


BY SIMILARITY, 


FT 


DISULFID 


946 


957 


BY SIMILARITY. 


FT 


DISULFID 


952 


966 


BY SIMILARITY, 


FT 


DISULFID 


968 


977 


BY SIMILARITY. 


FT 


DISULFID 


984 


995 


BY SIMILARITY. 


FT 


DISULFID 


989 


1009 


BY SIMILARITY. 


FT 


DISULFID 


1011 


1020 


BY SIMILARITY. 


FT 


DISULFID 


1211 


1222 


, BY SIMILARITY. 


FT 


DISULFID 


1216 


1231 


BY SIMILARITY. 


FT 


DISULFID 


1233 


1242 


BY SIMILARITY. 


FT 


DISULFID 


1485 


1496 


BY SIMILARITY, 


FT 


DISULFID 


1490 


1505 


BY SIMILARITY. 


FT 


DISULFID 


1507 


1516 


BY SIMILARITY. 


FT 


DISULFID 


1763 


1774 


BY SIMILARITY. 


FT 


DISULFID 


1768 


1783 


BY SIMILARITY. 


FT 


DISULFID 


1785 


1794 


BY SIMILARITY. 


FT 


DISULFID 


1801 


1812 


BY SIMILARITY. 


FT 


DISULFID 


1806 


1821 


BY SIMILARITY. 


FT 


DISULFID 


1823 


1832 


BY SIMILARITY. 


FT 


DISULFID 


1839 


1850 


BY SIMILARITY, 


FT 


DISDLFID 


1844 


1859 


BY SIMILARITY, 


FT 


DISDLFID 


1861 


1870 


BY SIMILARITY, 


FT 


DISDLFID 


1878 


1889 


BY SIMILARITY. 


FT 


DISDLFID 


1883 


1903 


BY SIMILARITY. 


FT 


DISULFID 


1905 


1914 


BY SIMILARITY. 


FT 


DISULFID 


1919 


1930 


BY SIMILARITY. 


FT 


DISULFID 


1924 


1939 


BY SIMILARITY. 


FT 


DISULFID 


1941 


1950 


BY SIMILARITY. 


FT 


DISULFID 


1957 


1968 


BY SIMILARITY. 


FT 


DISULFID 


1962 


1977 


BY SIMILARITY. 


FT 


DISULFID 


1979 


1988 


BY SIMILARITY. 


FT 


DISULFID 


1995 


2008 


BY SIMILARITY. 


FT 


DISULFID 


2002 


2017 


BY SIMILARITY. 


FT 


DISULFID 


2019 


2028 


BY SIMILARITY. 


FT 


CARBOHYD 


37 


37 


POTENTIAL, 


FT 


CARBOHYD 


96 


96 


POTENTIAL, 


FT 


CARBOHYD 


198 


198 


POTENTIAL. 


FT 


CARBOHYD 


238 


238 


POTENTIAL. 


FT 


CARBOHYD 


239 


239 


POTENTIAL, 


FT 


CARBOHYD 


336 


336 


POTENTIAL. 


FT 


CARBOHYD 


400 


400 


POTENTIAL, 


FT 


CARBOHYD 


550 


550 


POTENTIAL. 


FT 


CARBOHYD 


565 


565 


POTENTIAL. 


FT 


CARBOHYD 


736 


736 


POTENTIAL. 


FT 


CARBOHYD 


746 


746 


POTENTIAL. 


FT 


CARBOHYD 


860 


860 


POTENTIAL. 


FT 


CARBOHYD 


884 


884 


POTENTIAL. 


FT 


CARBOHYD 


976 


976 


POTENTIAL. 


FT 


CARBOHYD 


1102 


1102 


POTENTIAL, 


FT 


CARBOHYD 


1114 


1114 


POTENTIAL. 


FT 


CARBOHYD 


1138 


1138 


POTENTIAL. 


FT 


CARBOHYD 


1192 


1192 


POTENTIAL, 


FT 


CARBOHYD 


1245 


1245 


POTENTIAL. 


FT 


CARBOHYD 


1255 


1255 


POTENTIAL. 


FT 


CARBOHYD 


1354 


1354 


POTENTIAL, 


FT 


CARBOHYD 


1363 


1363 


POTENTIAL, 


FT 


CARBOHYD 


1441 


1441 


POTENTIAL, 


FT 


CARBOHYD 


1454 


1454 


POTENTIAL. 



Note: remainder of annotations omitted, 
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Query Match 12.9%; 
Best Local Similarity 31.7%; 
Matches 40; Conservative 



Score 245; DB 1; Length 2139; 

Pred. No. 6.68e-36; 

23; Mismatches 57; Indels 6; 



6; 



Db 271 CLNDPCMGHGTC-SSSPEGYECRCTARYSGKNCQKDNGSPCAKNPCENGGSCLENSEGNY 329 

I : I Ml I :!! I: I I II: II I I :| :|| : :| 
Qy 50 CHKKVC-AHGCCQPSSQSGFTCECEEGWMGPLCDQRINDPCLGNKCVHG-TCLPINAFSY 107 

Db 330 QCFCDPNHSGQHCETEVNIHPLCQTNPCLNNGACWIGGSGALTCECPKGYAGARCEVDT 389 

I I 1:1 I: I :: II I ::| I ::| I III |;:| |: : 
Qy 108 SCKCLEGHGGVLCDEEEDLFNPCQMIKC-KHGKCR-LSGVGQPYCECNSGFTGDSCDREI 165 

Db 390 DECASQ 395 
I :: 

Qy 166 S-CRGE 170 



RESULT 9 

ID NTC1JOMAN STANDARD; PRI; 2444 AA. 
AC P46531; 

DT 01-NOV-1995 (REL. 32, CREATED) 

fOl-NOV-1995 (REL. 32, LAST SEQUENCE UPDATE) 
01-FEB-1996 (REL. 33, LAST ANNOTATION UPDATE) 
NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1 PRECURSOR ( TRANSLOCATION - 
DE ASSOCIATED NOTCH PROTEIN TAN-1) (FRAGMENT). 
GN NOTCH1 OR TANl. 
OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EOTHERIA; 
OC PRIMATES; CATARRHINI; HOMINIDAE; HOMO. 
RN [1] 

RP SEQUENCE FROM N.A, 
RX MEDLINE; 91347367. 

RA ELLISEN L.W., BIRD J., WEST D.C., SORENG A.L., REYNOLDS T.C., 
RA SMITH S.D., SKLAR J.; 

RT "TAN-1, the human homolog of the Drosophila notch gene, is broken by 
RT chromosomal translocations in T lymphoblastic neoplasms."; 
RL CELL 66:649-661(1991). 

CC -I- FUNCTION: MAY BE IMPORTANT FOR NORMAL LYMPHOCYTE FUNCTION. IN 
CC ALTERED FORM, MAY CONTRIBUTE TO TRANSFORMATION OR PROGRESSION 
CC IN SOME T-CELL NEOPLASMS. 
CC -!• SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
CC -!- TISSUE SPECIFICITY: IN FETAL TISSUES MOST ABUNDANT IN SPLEEN, 
CC BRAIN STEM AND LUNG. ALSO PRESENT IN MOST ADULT TISSUES WHERE IT 
IS FOUND MAINLY IN LYMPHOID TISSUES. 
-!• SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 
■!- SIMILARITY; CONTAINS 36 EGF-LIKE DOMAINS. 

■ SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

■ SIMILARITY: CONTAINS 6 ANK REPEATS. 



CC 
CC 
CC 
CC 
CC 

•This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib,ch) . 

CC 

DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 



EMBL; M73980; G338675; -. 
MIM; 190198; -. 

PROSITE; PS00010; ASX.HYDROXYL; 20, 
PROSITE; PS00022; EGF 1; 34. 
PROSITE; PS01186; EGF J; 26, 
PROSITE; PS01187; EGF.CA; 18, 
PFAM; PF00008; EGF; 35. 
PFAM; PF00023; ank; 6. 
PFAM; PF00066; notch; 3, 
HSSP; P00740; 1IXA. 

DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 
TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN, 
SIGNAL 1 18 POTENTIAL. 

CHAIN 19 >2444 NEUROGENIC LOCUS NOTCH PROTEIN HOMOLOG 1. 

DOMAIN 19 1736 EXTRACELLULAR (POTENTIAL). 
TRANSMEM 1737 1757 POTENTIAL. 



FT 


UUM/UN 


^55 


>2444 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


20 


58 


EGF-LIKE 1. 


FT 


DOMAIN 


59 


99 


EGF-LIKE 2. 


FT 


DOMAIN 


102 


139 


EGF-LIKE 3, 


FT 


DOMAIN 


140 


176 


EGF-LIKE 4. 


FT 


DOMAIN 


178 


216 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


218 


255 


EGF-LIKE 6, 


FT, 


DOMAIN 


257 


293 


EGF-LIKE 7, CALCIUM* BINDING (POTENTIAL). 


FT 


DOMAIN 


295 


333 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


335 


371 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


372 


410 


EGF-LIKE 10, 


FT 


DOMAIN 


412 


450 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


452 


488 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


490 


526 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


528 


564 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


566 


601 


EGF-LIKE 15, CALCIUM -BINDING (POTENTIAL). 


FT 


DOMAIN 


603 


639 


EGF-LIKE 16, CALCIUM -BINDING (POTENTIAL), 


FT 


DOMAIN 


641 


676 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


678 


714 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL). 


FT 


UUMA1N 


716 


751 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


753 


789 


EGF-LIKE 20. 


FT 


DOMAIN 


791 


827 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL), 




UUWnlN 


829 


868 


F^P-T TCP *5*t 

U)i LIKfc Li, 


FT 


DOMAIN 


870 


906 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


908 


944 


EGF-LIKE 24, 


FT 


DOMAIN 


946 


982 


EGF-LIKE 25, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 26. 




DOMAIN 


1022 


1058 


EGF-LIKE 27. 


FT 


DUMAIN 


1060 


1096 


EGF-LIKE 28. 


FT 


DOMAIN 


1098 


1144 


EGF-LIKE 29. 


FT 


UUMAIN 




1182 


EGF-LIKE 30. 


FT 


DOMAIN 


1184 


1220 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


1222 


1266 


EGF-LIKE 32, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


1268 


1306 


EGF-LIKE 33. 


FT 


DOMAIN 


1308 


1347 


EGF-LIKE 34. 


FT 


DOMAIN 


1349 


1385 


EGF-LIKE 35. 


FT 


DOMAIN 


1388 


1427 


EGF-LIKE 36. 


FT 


DOMAIN 


1446 


1563 


3 X LIN/NOTCH REPEATS. 


FT 


REPEAT 


1446 


1481 


LIN/NOTCH 1. 


FT 


REPEAT 


1482 


1523 


LIN/NOTCH 2. 


FT 


REPEAT 


1524 


1563 


LIN/NOTCH 3. 


FT 


DOMAIN 


1876 


2087 


6 X ANK MOTIF REPEATS. 


FT 


REPEAT 


1876 


1921 


ANK MOTIF 1. 


FT 


REPEAT 


1923 


1954 


ANK MOTIF 2. 


FT 


REPEAT 


1956 


1987 


ANK MOTIF 3. 


FT 


REPEAT 


1990 


2021 


ANK MOTIF 4. 


FT 


REPEAT 


2023 


2054 


ANK MOTIF 5. 


FT 


REPEAT 


2056 


2087 


ANK MOTIF 6 , 


FT 


DOMAIN 


1576 


1579 


POLY-VAL. 




dumain 


iS™ 


1665 


POLY-ARG. 


tT 


DOMAIN 


1729 


1732 


POLY -PRO. 




DOMAIN 


1741 


1744 


POLY -ALA. 


FT 


UUMAIN 


1902 


1905 


POLY-GLU. 


pm 

J 


DOMAIN 


2260 


2263 


POLY-GLY . 




UUMAIN 




?!?b 


DAT V-PT VI 


FT 


1Y1M4TM 
UUMAIN 


} 




rULi'rKU. 


pm 


UlbULr ID 


24 


37 


BY SIMILARITY. 


J 


DISULF ID 


31 


46 


BY SIMILARITY. 


FT 


UlbULt ID 


48 


57 


BY SIMILARITY, 


FT 


DISULFID 


63 


74 


BY SIMILARITY. 


FT 


DISULFID 


68 


87 


BY SIMILARITY. 


FT 


DIaULi ID 


89 


98 


BY SIMILARITY, 


FT 


DISULFID 


106 


117 


BY SIMILARITY, 




Ul&ULr 1U 






Bi alMILAKlli . 


FT 


DISULFID 


129 


138 


BY SIMILARITY, 


FT 


DISULFID 


144 


155 


BY SIMILARITY, 


FT 


DISULFID 


149 


164 


BY SIMILARITY. 


FT 


DISULFID 


166 


175 


BY SIMILARITY. 


FT 


DISULFID 


182 


195 


BY SIMILARITY. 


FT 


DISULFID 


189 


204 


BY SIMILARITY. 


FT 


DISULFID 


206 


215 


BY SIMILARITY. 


FT 


DISULFID 


222 


233 


BY SIMILARITY, 


FT 


DISULFID 


227 


243 


BY SIMILARITY, 
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FT 


DISULFID 


245 


254 


BY SIMILARITY 


FT 


DISULFID 


261 


272 


BY SIMILARITY 


FT 


DISULFID 


266 


281 


BY SIMILARITY 


FT 


DISULFID 


283 


292 


BY SIMILARITY 


FT 


DISULFID 


299 


312 


BY SIMILARITY 


FT 


DISULFID 


306 


321 


BY SIMILARITY 


FT 


DISULFID 


323 


332 


BY SIMILARITY 


FT 


DISULFID 


339 


350 


BY SIMILARITY 


FT 


DISULFID 


344 


359 


BY SIMILARITY 


FT 


DISULFID 


361 


370 


BY SIMILARITY 


FT 


DISULFID 


376 


387 


BY SIMILARITY 


FT 


DISULFID 


381 


398 


BY SIMILARITY 


FT 


DISULFID 


400 


409 


BY SIMILARITY 


FT 


DISULFID 


416 


429 


BY SIMILARITY 


FT 


DISULFID 


423 


438 


BY SIMILARITY 


FT 


DISULFID 


440 


449 


BY SIMILARITY 


k 


DISULFID 


456 


467 


BY SIMILARITY 


1 


DISULFID 


461 


476 


BY SIMILARITY 


m 


DISULFID 


478 


487 


BY SIMILARITY 


FT 


DISULFID 


494 


505 


BY SIMILARITY 


FT 


DISULFID 


499 


514 


BY SIMILARITY 


FT 


DISULFID 


516 


525 


BY SIMILARITY 


FT 


DISULFID 


532 


543 


BY SIMILARITY 


FT 


DISULFID 


537 


552 


BY SIMILARITY 


FT 


DISULFID 


554 


563 


BY SIMILARITY 


FT 


DISULFID 


570 


580 


BY SIMILARITY 


FT 


DISULFID 


575 


589 


BY SIMILARITY 


FT 


DISULFID 


591 


600 


BY SIMILARITY 


FT 


DISULFID 


607 


618 


BY SIMILARITY 


FT 


DISULFID 


612 


627 


BY SIMILARITY 


FT 


DISULFID 


629 


638 


BY SIMILARITY 


FT 


DISULFID 


645 


655 


BY SIMILARITY 


FT 


DISULFID 


650 


664 


BY SIMILARITY 


FT 


DISULFID 


666 


675 


BY SIMILARITY 


FT 


DISULFID 


682 


693 


BY SIMILARITY 


FT 


DISULFID 


687 


702 


BY SIMILARITY 


FT 


DISULFID 


704 


713 


BY SIMILARITY 


FT 


DISULFID 


720 


730 


BY SIMILARITY 


FT 


DISULFID 


725 


739 


BY SIMILARITY 


FT 


DISULFID 


741 


750 


BY SIMILARITY 


FT 


DISULFID 


757 


768 


BY SIMILARITY 


FT 


DISULFID 


762 


777 


BY SIMILARITY 


FT 


DISULFID 


779 


788 


BY SIMILARITY 


FT 


DISULFID 


795 


806 


BY SIMILARITY 




DISULFID 


800 


815 


BY SIMILARITY 


1 


DISULFID 


817 


826 


BY SIMILARITY 


i 


DISULFID 


833 


844 ' 


BY SIMILARITY 


FT 


DISULFID 


838 


855 


BY SIMILARITY 


FT 


DISULFID 


857 


867 


BY SIMILARITY 


FT 


DISULFID 


874 


885 


BY SIMILARITY 


FT 


DISULFID 


879 


894 


BY SIMILARITY 


FT 


DISULFID 


896 


905 


BY SIMILARITY 


FT 


DISULFID 


912 


923 


BY SIMILARITY 


FT 


DISULFID 


917 


932 


BY SIMILARITY 


FT 


DISULFID 


934 


943 


BY SIMILARITY 


FT 


DISULFID 


988 


999 


BY SIMILARITY 


FT 


DISULFID 


993 


1008 


BY SIMILARITY 


FT 


DISULFID 


1010 


1019 


BY SIMILARITY 


FT 


DISULFID 


1026 


1037 


BY SIMILARITY 


FT 


DISULFID 


1031 


1046 


BY SIMILARITY 


FT 


DISULFID 


1048 


1057 


BY SIMILARITY 


FT 


DISULFID 


1064 


1075 


BY SIMILARITY 


FT 


DISULFID 


1069 


1084 


BY SIMILARITY 


FT 


DISULFID 


1086 


1095 


BY SIMILARITY 


FT 


DISULFID 


1102 


1123 


BY SIMILARITY 


FT 


DISULFID 


1117 


1132 


BY SIMILARITY 


FT 


DISULFID 


1134 


1143 


BY SIMILARITY 


FT 


DISULFID 


1150 


1161 


BY SIMILARITY 


FT 


DISULFID 


1155 


1170 


BY SIMILARITY 


FT 


DISULFID 


1172 


1181 


BY SIMILARITY 


FT 


DISULFID 


1188 


1199 


BY SIMILARITY 


FT 


DISULFID 


1193 


1208 


BY SIMILARITY 



Note: remainder of annotations omitted. 

Ouery Match 12.94; Score 246; DB 1; Length 2444; 

Best Local Similarity 33.6%; Pred. No. 3.78e-36; 

Matches 41; Conservative 26; Mismatches 44; Indels 11; Gaps 10; 

Db 714 LSEVNECNSNPCVHGACR-DSLNGYKCDCDPGWSGTNCDINNNE-CESNPCVNGGTCKDM 771 

: : I: : I II |: I :|: |:|: II Ml hi :l Ihl II : 
Qy 44 LPGCEPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPI 102 

Db 772 TS-GIVCTCREGFSGPNC-OTNI-NECASNPCLNKGTC-IDDVA-GYKCNCLLPYTGAT 825 

: : I I II :| I : :: I I I : I I : |: I |:| :|| : 
Qy _ 103 NAFSYSCKCLEGHGGVLCDEEEDLFNPCQMIKC -KHGKCRLSGVGQPY -CECNSGFTGDS 160 

Db 826 CE 827 

I: 

Qy 161 CD 162 



ILT 10 

NOTCJROME STANDARD; PRT; 2703 AA, 
P07207; P04154; 

01-NOV-1986 (REL. 03, CREATED) 
01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 
15-JUL-1998 (REL. 36, LAST ANNOTATION UPDATE) 
NEUROGENIC LOCUS NOTCH PROTEIN PRECURSOR, 
N, 

DROSOPHILA MELANOGASTER (FRUIT FLY). 

EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 
PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 
DROSOPHILIDAE; DROSOPHILA. 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 86079539. 

WHARTON K.A., JOHANSEN K.M., XU T., ARTAVANIS-TSAKONAS S,; 
"Nucleotide sequence from the neurogenic locus notch implies a gene 
product that shares homology with proteins containing EGF-like 



CELL 43:567-581(1985). 
[2] 

SEQUENCE FROM N.A. 

STRAIN-OREGON-R; 

MEDLINE; 87064624. 

KIDD S,, KELLEY M,R, , YOUNG M.W.; 

"Sequence of the notch locus of Drosophila melanogaster: relationship 
of the encoded protein to mammalian clotting and growth factors."; 
MOL. CELL. BIOL. 6:3094-3108(1986). 
[3] 

SEQUENCE OF 2505-2611 FROM N.A. 
MEDLINE; 85099329. 

WHARTON K.A., YEDVOBNICK B., FINNERTY V.G., ARTAVANIS-TSAKONAS S.; 
"opa: a novel family of transcribed repeats shared by the Notch locus 
and other developmentally regulated loci in D. melanogaster ; 
CELL 40:55-62(1985). 
[4] 

SEQUENCE OF 1-8 FROM N.A. 
MEDLINE; 87257846. 

KELLEY M.R., KIDD S., BERG R.L, YOUNG M.W.; 

"Restriction of P-element insertions at the Notch locus of Drosophila 

melanogaster."; 

MOL. CELL. BIOL. 7:1545-1548(1987). 
[5] 

REVIEW. 
HARRIS W.A.; 

"Many cell types specified by Notch function."; 
CURR. BIOL. 1:120-122(1991). 

-!- FUNCTION: NOTCH PROTEIN IS ESSENTIAL FOR PROPER DIFFERENTIATION OF 
ECTODERM. 

-!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 
■!- SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 
OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 
THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 
-!- SIMILARITY: HIGH, WITH OTHER NOTCH-TYPE PROTEINS. 



Tue Jun 1 10:16:08 1999 



US-09-191-! 



■647-14. rsp 



Page 



cc 


■!- SIMILARITY: CONTAINS 36 


EGF-LIKE DOMAINS, 


FT 


DOMAIN 


1259 


1295 




cc 


■!■ SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 


FT 


DOMAIN 


1297 


1335 


EGF-LIKE 33, 


cc 


-!■ SIMILARITY: CONTAINS 6 ANK REPEATS. 


FT 


DOMAIN 


1337 


1373 


EGF-LIKE 34, 


cc 










FT 


DOMAIN 


1375 


1412 


EGF-LIKE 35, 


cc 


This SWISS-PROT entry is copyright. It is produced through a collaboration 


FT 


DOMAIN 


1415 


1451 


EGF-LIKE 36, 


cc 


between 


the Swiss Institute of Bioinformatics and the EMBL outstation • 


FT 


DOMAIN 


1475 


1593 


3 X I.TN/NflTm RFPFATS 

J A Lilly MUl^n ncfLnlO. 


cc 


the European Bioinformatics Institute. There are no restrictions on its 


FT 


REPEAT 


1475 


1513 


LIN/NOTCH 1. 


cc 


use by 


non-profit institutions as long as its content is in no way 


FT 


REPEAT 


1514 


1553 


LIN/NOTCH 2. 


cc 


modified and this statement is not removed. Usage by and for commercial 


FT 


REPEAT 


1554 


1593 


LIN/NOTCH 3. 


cc 


entities requires a license 


agreement (See http://www.isb-sib.ch/announce/ 


FT 


DOMAIN 


1896 


2109 


6 X ANK MOTIF REPEATS. 


cc 


or send 


an email to license@isb-sib.ch), 


FT 


DOMAIN 


2538 


2568 


POLY-GLN (OPA- REPEAT) , 


cc 










FT 


DISULFID 


62 


73 


RY ^TMHIRTTY 


DR 


EMBL; M16152; G157988; -. 




FT 


DISULFID 


67 


83 


BY SIMILARITY. 


DR 


EMBL; M16153; G157988; JOINED. 


FT 


DISULFID 


85 


94 


BY SIMILARITY. 


DR 


EMBL; M16149; G157988; JOINED, 


FT 


DISULFID 


100 


111 


BY SIMILARITY, 


DR 


EMBL; M16150; G157988; JOINED. 


FT 


DISULFID 


105 


124 


BY SIMILARITY. 


DR 


EMBL; M16151; G157988; JOINED. 


FT 


DISULFID 


126 


135 


BY SIMILARITY. 


DR 


EMBL; K035O8; G157993; -. 




FT 


DISULFID 


143 


154 


BY SIMILARITY, 


DR 


EMBL; M13689; G157993; JOINED. 


FT 


DISULFID 


148 


164 


BY SIMILARITY, 


DR 


EMBL; KQ3507; G157993; JOINED. 


FT 


DISULFID 


166 


175 


BY SIMILARITY, 


DR 


EMBL; M12175; G95Q317; -. 




FT 


DISULFID 


181 


192 


BY SIMILARITY. 


ft 
I 


EMBL; M16025; G157995; 




FT 


DISULFID 


186 


203 


BY SIMILARITY. 


1 


PIR; A24420; A24420. 




FT 


DISULFID 


205 


214 


BY SIMILARITY, 




PIR; A24768; A24768. 




FT 


DISULFID 


221 


232 


BY SIMILARITY. 


DR 


PIR; A05267; A05267. 




FT 


DISULFID 


226 


241 


BY SIMILARITY, 


DR 


FLYBASE; FBgn0004647; N. 




FT 


DISULFID 


243 


252 


BY SIMILARITY. 


DR 


PROSITE; PS00010, 


ASXJYDROXYL; 22. 


FT 


DISULFID 


259 


270 


RY QTMTT.ARTTY 


DR 


PROSITE; PS00022; EGF_1; 34. 




FT 


DISULFID 


264 


279 


BY SIMILARITY. 


DR 


PROSITE 


PS01186; EGF_2; 28. 




FT 


DISULFID 


281 


290 


BY SIMILARITY, 


DR 


PROSITE; PS01187; EGF.CA; 22. 


FT 


DISULFID 


297 


308 


BY SIMILARITY, 


DR 


PFAM; PI 


00008; EGF; 36. 




FT 


DISULFID 


302 


317 


BY SIMILARITY, 


DR 


PFAM; PFO0O23; ank; 6. 




FT 


DISULFID 


319 


328 


BY SIMILARITY. 


DR 


PFAM; PFO0O66; notch; 3. 




FT 


DISULFID 


335 


349 


BY SIMILARITY. 


DR 


HSSP; P00740; 1IXA. 




FT 


DISULFID 


343 


358 


BY SIMILARITY. 


KW 


DIFFERENTIATION; NEUROGENESIS; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; 


FT 


DISULFID 


360 


369 


BY SIMILARITY. 


KW 


TRANSMEMBRANE; SIGNAL; GLYCOPROTEIN. 


FT 


DISULFID 


376 


387 


BY SIMILARITY. 


FT 


SIGNAL 


1 


44 


POTENTIAL. 


FT 


DISULFID 


381 


396 


BY SIMILARITY. 


FT 


CHAIN 


45 


2703 


NECROGENIC LOCOS NOTCH PROTEIN. 


FT 


DISULFID 


398 


407 


BY SIMILARITY. 


FT 


DOMAIN 


45 


1745 


EXTRACELLULAR (POTENTIAL). 


FT 


DISULFID 


413 


424 


BY SIMILARITY. 


FT 


TRANSMEM 1746 


1766 


POTENTIAL. 


FT 


DISULFID 


418 


435 


BY SIMILARITY. 


FT 


DOMAIN 


1767 


2703 


CYTOPLASMIC (POTENTIAL). 


FT 


DISULFID 


437 


446 


BY SIMILARITY. 


FT 


DOMAIN 


58 


1451 


36 X EGF-TYPE REPEATS, 


FT 


DISULFID 


453 


465 


BY SIMILARITY. 


FT 


DOMAIN 


58 


95 


EGF-LIKE 1, 


FT 


DISULFID 


459 


474 


BY SIMILARITY, 


FT 


DOMAIN 


96 


136 


EGF-LIKE 2, 


FT 


DISULFID 


476 


485 


BY SIMILARITY. 


FT 


DOMAIN 


139 


176 


EGF-LIKE 3, 


FT' 


DISULFID 


492 


503 


BY SIMILARITY, 


FT 


DOMAIN 


177 


215 


EGF-LIKE 4 . 


FT 


DISULFID 


497 


512 


BY SIMILARITY. 


FT 


DOMAIN 


217 


253 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


514 


523 


BY SIMILARITY. 


FT 


DOMAIN 


255 


291 


EGF-LIKE 6. 


FT 


DISULFID 


530 


541 


BY SIMILARITY. 


FT 


DOMAIN 


293 


329 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


535 


550 


BY SIMILARITY, 


m 


DOMAIN 


331 


370 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


552 


561 


BY SIMILARITY, 


1 


DOMAIN 


372 


408 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


568 


579 


BY SIMILARITY. 


W 


DOMAIN 


409 


447 


EGF-LIKE 10. 


FT 


DISULFID 


573 


588 


BY SIMILARITY, 


FT 


DOMAIN 


449 


486 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


590 


599 


BY SIMILARITY. 


FT 


DOMAIN 


488 


524 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL). 




DISULFID 


606 


616 


BY SIMILARITY. 


FT 


DOMAIN 


526 


562 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


611 


625 


RV 1TMTT1RTTV 
01 DlmLttKll I . 


FT 


DOMAIN 


564 


600 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) , 




nTs.ni .Fin 


627 


636 


HV CTMTT ADTTV 


FT 


DOMAIN 


602 


637 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL), 


FT 


DISULFID 


643 


654 


RV QTMTT1BTTV 


FT 


DOMAIN 


639 


675 


EGF-LIKE 16, CALCIUM-BINDING (POTENTIAL). 


FT 




648 


663 


UV CTMTT &DTTV 


FT 


DOMAIN 


677 


713 


EGF-LIKE 17, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


665 


674 


BY SIMILARITY. 


FT 


DOMAIN 


715 


751 


EGF-LIKE 18, CALCIUM-BINDING (POTENTIAL) . 




DT^riT.FTn 

WLOXJUE IU 


681 


692 


RV STMTtlPTTV 


FT 


DOMAIN 


753 


789 


EGF-LIKE 19, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


686 


701 


nv CTMTT 1DTTV 

oi simuuuii. 


FT 


DOMAIN 


791 


827 


EGF-LIKE 20, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


703 


712 


RY CTMTMRTTY 


FT 


DOMAIN 


829 


865 


EGF-LIKE 21, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


719 


730 


BY SIMILARITY . 


FT 


DOMAIN 


867 


905 


EGF-LIKE 22. 


FT 


DISULFID 


724 


739 


BY SIMILARITY, 


FT 


DOMAIN 


907 


944 


EGF-LIKE 23, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


741 


750 


BY SIMILARITY, 


FT 


DOMAIN 


946 


982 


EGF-LIKE 24, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


757 


768 


BY SIMILARITY. 


FT 


DOMAIN 


984 


1020 


EGF-LIKE 25, 


FT 


DISULFID 


762 


777 


BY SIMILARITY. 


FT 


DOMAIN 


1022 


1058 


EGF-LIKE 26, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


779 


788 


BY SIMILARITY, 


FT 


DOMAIN 


1060 


1096 


EGF-LIKE 27. 


FT 


DISULFID 


795 


806 


BY SIMILARITY. 


FT 


DOMAIN 


1098 


1134 


EGF-LIKE 28. 


FT 


DISULFID 


800 


815 


BY SIMILARITY. 


FT 


DOMAIN 


1136 


1181 


EGF-LIKE 29, 


FT 


DISULFID 


817 


826 


BY SIMILARITY. 


FT 


DOMAIN 


1183 


1219 


EGF-LIKE 30, CALCIUM-BINDING (POTENTIAL) . 


FT 


DISULFID 


833 


844 


BY SIMILARITY. 


FT 


DOMAIN 


1221 


1257 


EGF-LIKE 31, CALCIUM-BINDING (POTENTIAL). 


FT 


DISULFID 


838 


853 


BY SIMILARITY. 
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FT DISULFID 855 864 BY SIMILARITY. 

Note: remainder of annotations omitted. 

Query Match 12.8*; Score 244; DB 1; Length 2703; 

Best Local Similarity 31.0*; Pred. No. 1.18e-35; 

Matches 39; Conservative 30; Mismatches 47; Indels 10; Gaps 10; 

Db 566 DDCQSQPCRNRGICH-DSIAGYSCECPPGYTGTSCEININD-CDSNPCHRGKCID-DVNS 622 

: I: I :l I: I :|::||| III: II I :| | :| |: : | 
Qy 48 EPCHKKVC-AHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFS 106 

Db 623 FKCLCDPGYTGYIC-QKQ-I-NECESNPCQFDGHCQ-DRVGSYYCQCQAGTSGKNCEVNV 678 

: I I I I :| : : : I I: I II: II lhl :|.:| :|: :: 
Qy 107 YSCKCLEGHGGVLCDEEEDLFNPCQMIKCK-HGKCRLSGVGQPYCECNSGFTGDSCDREI 165 




679 NECHSN 684 



RESULT 11 

ID FBP3_STRP(J STANDARD; PRT; 570 AA. 

AC P49013; 

DT 01-FEB-1996 (REL. 33, CREATED) 

DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE FIBROPELLIN C PRECURSOR (EPIDERMAL GROWTH FACTOR -RELATED PROTEIN 3) 

DE (EGF III) (FIBROPELLIN III). 

GN EGF3. 

OS STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN). 

OC EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 

OC EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROTIDAE; 

OC STRONGYLOCENTROTUS, 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-GASTRULA; 

RX MEDLINE; 93273088. 

RA BISGROVE B.W., RAFF R.A.; 

RT "The SpEGF III gene encodes a member of the f ibropellins : EGF repeat - 

RT containing proteins that form the apical lamina of the sea urchin 

RT embryo."; 

RL DEV. BIOL. 157:526-538(1993). 

CC -!- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 

•MATRIX. 
-!- SUBCELLULAR LOCATION: EXTRACELLULAR. 
-!- DEVELOPMENTAL STAGE: LOW LEVELS IN UNFERTILIZED EGGS AND DURING 
CC EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN LATE 
CC MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS MAINTAINED 
CC THROUGH SUBSEQUENT STAGES. 

CC -I- EXPRESSED BOTH MATERNALLY AND ZYGOTICALLY. 

CC •!• SIMILARITY: CONTAINS 8 EGF-LIKE DOMAINS . 

CC -!- SIMILARITY: CONTAINS 1 CUB DOMAIN, 

CC SIMILARITY: THE MERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
CC TO AVIDIN/STREPTAVIDIN, 

CC 

CC This SWISS-PROT entry is copyright, It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation • 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to licenseGisb-sib.ch). 

cc 

DR EMBL; L07045; G310660; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 8. 

DR PROSITE; PS00022; EGF J; 8. 

DR PROSITE; PS00577; AVIDIN; 1. 

DR PROSITE; PS01180; CUB; 1, 

DR PROSITE; PS01186; EGF.2; 7. 

DR PROSITE; PS01187; EGF.CA; 6. 

DR PFAM; PF00008; EGF; 8. 

DR PFAM; PF00431; CUB; 1. 



DR HSSP; P00740; 1IXA. 



KW BIOTIN; EGF-LIKE DOMAIN; REPEAT; SIGNAL; GLYCOPROTEIN. 



FT 


SIGNAL 


1 


17 


POTENTIAL, 


FT 


CHAIN 


18 


570 


FIBROPELLIN C. 


FT 


DOMAIN 


18 


55 


EGF-LIKE 1, 


FT 


DOMAIN 


62 


175 


CUB, 


FT 


DOMAIN 


176 


212 


EGF-LIKE 2, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


214 


250 


EGF-LIKE 3, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


252 


288 


EGF-LIKE 4, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


290 


326 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


328 


364 


EGF-LIKE 6, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


366 


402 


EGF-LIKE 7, 


FT 


DOMAIN 


404 


440 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


442 


570 


AVIDIN-LIKE. 


FT 


DISULFID 


23 


34 


BY SIMILARITY. 


FT 


DISULFID 


28 


43 


BY SIMILARITY. 


FT 


DISULFID 


45 


54 


BY SIMILARITY. 


FT 


DISULFID 


180 


191 


BY SIMILARITY. 


FT 


DISULFID 


185 


200 


BY SIMILARITY. 


FT 


DISULFID 


202 


211 


BY SIMILARITY. 


FT 


DISULFID 


218 


229 


BY SIMILARITY. 


FT 


DISULFID 


223 


238 


BY SIMILARITY. 


FT 


DISULFID 


240 


249 


BY SIMILARITY. 


FT 


DISULFID 


256 


267 


BY SIMILARITY. 


FT 


DISULFID 


261 


276 


BY SIMILARITY. 


FT 


DISULFID 


278 


287 


BY SIMILARITY. 


FT 


DISULFID 


294 


305 


BY SIMILARITY. 


FT 


DISULFID 


299 


314 


BY SIMILARITY. 


FT 


DISULFID 


316 


325 


BY SIMILARITY. 


FT 


DISULFID 


332 


343 


BY SIMILARITY. 


FT 


DISULFID 


337 


352 


BY SIMILARITY. 


FT 


DISULFID 


354 


363 


BY SIMILARITY. 


FT 


DISULFID 


370 


381 


BY SIMILARITY. 


FT 


DISULFID 


375 


390 


BY SIMILARITY. 


FT 


DISULFID 


392 


401 


BY SIMILARITY. 


FT 


DISULFID 


408 


419 


BY SIMILARITY. 


FT 


DISULFID 


413 


428 


BY SIMILARITY. 


FT 


DISULFID 


430 


439 


BY SIMILARITY. 


FT 


CARBOHYD 


30 


30 


POTENTIAL, 


FT 


CARBOHYD 


136 


136 


POTENTIAL, 


FT 


CARBOHYD 


357 


357 


POTENTIAL. 


SQ 


SEQUENCE 


570 AA; 


61116 MW 


265BC4BB CRC32; 



Query Match 12.7%; Score 242; DB 1; Length 570; 

Best Local Similarity 32,04; Pred. No. 3.69e-35; 

Matches 40; Conservative 29; Mismatches 43; Indels 13; Gaps 12; 

Db 216 DECASAPCRNGGAC-VDQVNGYTCNCIPGFNGVNCENNINE-CASIPCLNGG-ICVDGIN 272 

: I hi I :|:ll:| III: h I : |::| : ::::: 
Qy 48 EPCHKKVCAHG-CCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFS 106 



Db 273 QFACTCLPGYTGILC--ETDI-NECASSPCQNGGSCT-DAVNR-YTCDCRAGFTGSNCET 327 

::| II I hll I :| I :| : I |:| :|||| :|: 

Qy 107 - YSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHG - KCRLSGVGQPY -CECNSGFTGDSCDR 163 

Db 328 NINEC 332 
:|: I 

Qy 164 EIS-C 167 



RESULT 12 

ID NTC4_MOUSE STANDARD; PRT; 1964 AA. 

AC P31695; Q62389; 

DT 01-JUL-1993 (REL, 26, CREATED) 

DT 01-NOV-1997 (REL. 35, LAST SEQUENCE UPDATE) 

DT 15-JUL-1998 (REL, 36, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4 PRECURSOR (TRANSFORMING 

DE PROTEIN INT-3), 

GN NOTCH4 OR INT3 OR INT-3, 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; 

OC RODENTIA; SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 
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RT 



RP SEQUENCE FROM N.A. 

RX MEDLINE; 92194507. 

RA ROBBINS J., BLONDEL B.J., GALLAHAN D., CALLAHAN R,; 

RT "Mouse mammary tumor gene int-3: a member of the notch gene family 

RT transforms mammary epithelial cells."; 

RL J, VIROL. 66:2594-2599(1992). 

RN [2] 

RP REVISIONS, SEQUENCE FROM N.A. 

RA CALLAHAN R.; 

RL SUBMITTED (NOV-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [3] 

RP SEQUENCE FROM N.A. 

RC TISSUE-LUNG, AND TESTIS; 

RX MEDLINE; 96281668. 

RA UYTTENDAELE H., MARAZZI G., WU G., YAN Q. , SASSOON D., KITAJEWSKI J.; 

"Notch4/int-3, a mammary proto-oncogene, is an endothelial 

RT cell-specific mammalian Notch gene."; 

RL DEVELOPMENT 122:2251-2259(1996). 

SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 
DISEASE: ACTIVATED INT-3 TRANSFORMS MAMMARY EPITHELIAL CELLS. 
SIMILARITY: CONTAINS 29 EGF-LIKE DOMAINS. 
SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 
SIMILARITY: CONTAINS 6 CDC10/SWI6 REPEATS. 
SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 

DR EMBL; M80456; G1714084; -. 

DR EMBL; 043691; G1401160; -. 

DR PIR; A38072; TVMVT3. 

DR MGD; MGI: 107471; NOTCH4. 

DR PROSITE; PS00010; ASXJYDROXYL; 11, 

DR PROSITE; PS00022; EGF.l; 28. 

DR PROSITE; PS01186; EGF_2; 21. 

DR PROSITE; PS01187; EGF_CA; 9. 

DR PFAM; PF00008; EGF; 26, 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 2, 

DR HSSP; P0074Q; 1IXA. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; ^ 

KW GLYCOPROTEIN; PROTO-ONCOGENE; ANK REPEAT; SIGNAL. 



FT 


SIGNAL 


1 


20 


POTENTIAL. 


FT 


CHAIN 


21 


1964 


NEUROGENIC LOCUS NOTCH HOMOLOG PROTEIN 4, 




DOMAIN 


21 


1443 


EXTRACELLULAR (POTENTIAL). 




TRANSMEM 


1444 


1464 


POTENTIAL. 




DOMAIN 


1465 


1964 


CYTOPLASMIC (POTENTIAL), 




DOMAIN 


21 


60 


EGF-LIKE 1. 


FT 


DOMAIN 


61 


112 


EGF-LIKE 2. 


FT 


DOMAIN 


115 


152 


EGF-LIKE 3. 


FT 


DOMAIN 


153 


189 


EGF-LIKE 4. 


FT 


DOMAIN 


191 


229 


EGF-LIKE 5, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


231 


271 


EGF-LIKE 6. 


FT 


DOMAIN 


273 


309 


EGF-LIKE 7, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


311 


350 


EGF-LIKE 8, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


352 


388 


EGF-LIKE 9, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


389 


427 


EGF-LIKE 10, 


FT 


DOMAIN 


429 


470 


EGF-LIKE 11, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


472 


508 


EGF-LIKE 12, CALCIUM-BINDING (POTENTIAL), 


FT 


DOMAIN 


510 


546 


EGF-LIKE 13, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


548 


584 


EGF-LIKE 14, CALCIUM-BINDING (POTENTIAL) . 


FT 


DOMAIN 


586 


622 


EGF-LIKE 15, CALCIUM-BINDING (POTENTIAL). 


FT 


DOMAIN 


622 


656 


EGF-LIKE 16, 


FT 


DOMAIN 


658 


686 


EGF-LIKE 17. 


FT 


DOMAIN 


688 


724 


EGF-LIKE 18, 


FT 


DOMAIN 


726 


762 


EGF-LIKE 19. 


FT 


DOMAIN 


764 


800 


EGF-LIKE 20. 


FT 


DOMAIN 


803 


839 


EGF-LIKE 21. 





nnMiTU 

UUMA1N 






EGF-LIKE 22, 


FT 


nfiVlTW 


878 


924 


PPP-TTtfP 11 




UUMA1H 


926 


962 


EGF-LIKE 24, 




UUMA1N 


964 


1000 


EGF-LIKE 25. 


t"T 


UUMA1N 


IS!? 


1040 


EGF-LIKE 26, 


J 


DOMAIN 


1042 


1081 


EGF-LIKE 27, 




DOMAIN 


1083 


1122 


EGF-LIKE 28. 


FT 


DOMAIN 


1126 


1167 


EGF-LIKE 29, 


FT 


DOMAIN 


1168 


1282 


3 X LIN/NOTCH REPEATS. 




REPEAT 


1168 


1208 


LIN/NOTCH 1. 


FT 


REPEAT 


1209 


1242 


LIN/NOTCH 2, 


FT 


REPEAT 


1243 


1282 


LIN/NOTCH 3, 


FT 


DOMAIN 


1572 


1785 


6 X ANK MOTIF REPEATS. 




KLFLAI 




1603 


ANK MOTIF 1, 




KLFLA1 


1622 


1653 


ANK MOTIF 2. 




KhrLAl 


1654 


1685 


ANK MOTIF 3, 


PT 


KLrLAI 




1719 


ANK MOTIF 4 . 


PT 

J 


DPDPST 

KbrLAl 


1721 


1752 


' ANK MOTIF 5, 


J 


DPDP&T 


^« 


1785 


ANK MOTIF 6, 




riTcriT inn 






BY SIMILARITY. 


FT 


mcriT inn 


32 


48 


DV CTUTT JlDTfflV 

Bi SIMILARITY . 


FT 


UlSULf J.JJ 


50 


59 


DV CTUTT JVOTTV 

BI SIMILARITY. 


FT 


FlTQfTT.PTn 


65 


77 


DV CTUTT JVDTTV 

Bi SIMILARITY. 




nTcriT.VTn 


71 




DV CTUTT ADTTV 

DI slMILARlli, 


FT 


nTCnT cm 

U1SULP 1LI 


102 


111 


DV CTUTT RDTTIV 

BI SIMILARITY. 




riTQni.irTn 

L/1SUM1U 


119 


130 


DV CTUTT HDTTV 

di blMlLARlli. 


PT 


UlSUlrf ID 




ic! 


DV CTUTT KDTrtlV 

ai SIMILARITY. 


li 


UlsUL(lI) 


in 


151 


BY SIMILARITY. 


*3 


DlsULclD 


157 


IS! 


BY SIMILARITY. 




nToriT pta 


162 


177 


BY SIMILARITY. 


PT 


TMOrTT T?TF\ 

UlsULHU 


179 


188 


BY SIMILARITY, 


t™ 


HTCrTT DTA 

UlsULMU 


195 


208 


BY SIMILARITY. 


FT 


DISULFID 


202 


217 


BY SIMILARITY, 


FT 


DISULFID 


219 


228 


BY SIMILARITY, 


FT 


DISULFID 


235 


246 


BY SIMILARITY, 


FT 


DISULFID 


240 


259 


BY SIMILARITY, 


FT 


DISULFID 


261 


270 


BY SIMILARITY. 


FT 


DISULFID 


235 


246 


BY SIMILARITY, 


FT 


DISULFID 


240 


259 


BY SIMILARITY. 


FT 


DISULFID 


261 


270 


BY SIMILARITY. 


FT 


DISULFID 


277 


288 


BY SIMILARITY. 


FT 


DISULFID 


282 


297 


BY SIMILARITY. 




DISULFID 


299 


308 


BY SIMILARITY, 


PT 


HTCrTT VTT\ 

UloUWlU 




329 


BY SIMILARITY, 


FT 


HTCTTT DTrt 


323 


338 


BY SIMILARITY. 


FT 


L/ISUIjF ill 


340 


349 


DV CTUTT ADTTV 
DI SlMILARlli. 


FT 


DISULFID 


356 


367 


DV CTMTT 1DTTV 
DI SlMlLAKlil, 


FT 


nT<;nr,PTn 


361 


376 


DV CTMTT RDTTV 
DI SIMlLnKllI. 


FT 


UlOUur iu 


378 


387 


DV CTMTT IDTfV 
DI S1M1LAK11I. 


FT 


UlSUur iU 


393 


!?r 


DV CTUTT JDTTV 
DI blMlLAKllI. 










DV CTUTT RDTTV 

BY SIMILARITY, 


FT 


L/lOUur ill 


417 


426 


DV CTUTT JIDTTV 

di alMlLARIli. 


FT 


DISULFID 


433 


449 


DV CTMTT 1DTTV 
DI OlMlLAKllI, 




nTcm.PTn 

Liisubr m 


443 


458 


OV CTMTT HDTTV 

di SlMlliAKIll. 


FT 


nTcriT.PTn 
uisuiir iu 


460 


469 


DV CTMTT ADTTV 

DI SlMlLAKlli, 




mcriT ptt\ 
DlsULt 11) 






BY SIMILARITY, 




riTcriT dth 


481 


496 


BY SIMILARITY. 


FT 


JJ1SULC1U 


498 




BY SIMILARITY, 




nTcm ptt\ 




525 


BY SIMILARITY, 


FT 


nTCIIT.PTn 


519 




DV CTUTT JIDTTV 

Bi slMlLARIli, 


FT 


nTQnr.FTn 
uisuiif iu 


536 


545 


DV CTMTT RDTTV 

Di slMILARlli, 


nl 


UlsULr ID 


552 


563 


BY SIMILARITY, 


FT 


DISULFID 


557 


572 


BY SIMILARITY, 




nTcm ptti 


574 


583 


BY SIMILARITY. 


FT 


DISULFID 


590 


601 


BY SIMILARITY. 


FT 


DISULFID 


595 


610 


BY SIMILARITY. 


FT 


DISULFID 


612 


621 


BY SIMILARITY. 


FT 


DISULFID 


626 


637 


BY SIMILARITY, 


FT 


DISULFID 


631 


646 


BY SIMILARITY. 


FT 


DISULFID 


648 


655 


BY SIMILARITY. 


FT 


DISULFID 


662 


669 


BY SIMILARITY. 


FT 


DISULFID 


664 


674 


BY SIMILARITY. 


FT 


DISULFID 


676 


685 


BY SIMILARITY. 
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FT 


DISULFID 


692 


703 


BY SIMILARITY, 


FT 


DISULFID 


697 


712 


BY SIMILARITY, 


FT 


DISULFID 


714 


723 


BY SIMILARITY, 


FT 


DISULFID 


730 


741 


BY SIMILARITY, 


FT 


DISULFID 


735 


750 


BY SIMILARITY, 


FT 


DISULFID 


752 


761 


BY SIMILARITY. 


FT 


DISULFID 


768 


779 


BY SIMILARITY. 


FT 


DISULFID 


773 


788 


BY SIMILARITY. 


FT 


DISULFID 


790 


799 


BY SIMILARITY, 


FT 


DISULFID 


807 


818 


BY SIMILARITY, 


FT 


DISULFID 


812 


827 


BY SIMILARITY. 


FT 


DISULFID 


829 


838 


BY SIMILARITY. 


FT 


DISULFID 


845 


856 


BY SIMILARITY. 


FT 


DISULFID 


850 


865 


BY SIMILARITY, 


FT 


DISULFID 


867 


876 


BY SIMILARITY, 


FT 


DISULFID 


882 


903 


BY SIMILARITY. 


K 


DISULFID 


897 


912 


BY SIMILARITY. 


■ 


DISULFID 


914 


923 


BY SIMILARITY. 


m 


DISULFID 


930 


941 


BY SIMILARITY. 


FT 


DISULFID 


935 


950 


BY SIMILARITY, 


FT 


DISULFID 


952 


961 


BY SIMILARITY. 


FT 


DISULFID 


968 


979 


BY SIMILARITY. 


FT 


DISULFID 


973 


988 


BY SIMILARITY. 


FT 


DISULFID 


990 


999 


BY SIMILARITY. 


FT 


DISULFID 


1006 


1019 


BY SIMILARITY. 


FT 


DISULFID 


1011 


1028 


BY SIMILARITY. 


FT 


DISULFID 


1030 


1039 


BY SIMILARITY. 


FT 


DISULFID 


1046 


1057 


BY SIMILARITY. 


FT 


DISULFID 


1051 


1069 


BY SIMILARITY. 


FT 


DISULFID 


1071 


1080 


BY SIMILARITY. 


FT 


DISULFID 


1087 


1098 


BY SIMILARITY. 


FT 


DISULFID 


1092 


1110 


BY SIMILARITY. 


FT 


DISULFID 


1112 


1121 


BY SIMILARITY. 


FT 


DISULFID 


1130 


1142 


BY SIMILARITY. 


FT 


DISULFID 


1136 


1155 


BY SIMILARITY. 


FT 


DISULFID 


1157 


1166 


BY SIMILARITY, ' 


FT 


CARBOHYD 


711 


711 


POTENTIAL. 


FT 


CARBOHYD 


960 


960 


POTENTIAL, 


FT 


CARBOHYD 


1139 


1139 


POTENTIAL. 


FT 


CONFLICT 


43 


43 


Q •> R (IN REF. 3). 


FT 


CONFLICT 


298 


298 


L •> P (IN REF. 3). 


FT 


CONFLICT 


884 


884 


M -> K (IN REF. 3). 



Note: remainder of annotations omitted. 

«uery Match 12.7%; Score 241; db 1; Length 1964; 

est Local Similarity 37.2%; Pred. No. 6.52e-35; 
Matches 48; Conservative 17; Mismatches 48; Indels 16; Gaps 13; 

Db 966 EACQSQPCHNHGTC-TSRPGGFHCACPPGFVGLRCEGDVDE-CLDRPCHPSGTAACHSLA 1023 

1:1: I II I I :||| I I :| |: :: II I II I :: 
Qy 48 EPCHKKVC-AHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVH-GT-CLPI- 102 

Db 1024 NAF-Y-CQCLPGHTGQRCEVEMDL—CQSQPCSNGGSCEITTGPPPGFTCHCPKGFEGP 1078 

III II II II I I: I II II 1:1 I;: I : I I II I 
Qy 103 NAFSYSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQP-Y-CECNSGFTGD 159 

Db 1079 TCSHKALSC 1087 

:| : :M 
Qy 160 SCD-REISC 167 



RESULT 13 

ID GLPl.CAEEL STANDARD; PRT; 1295 AA. 

AC P1350B; 

DT 01-JAN-1990 (REL. 13, CREATED) 

DT 01-JAN-1990 (REL. 13, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 

DE GLP*1 PROTEIN PRECURSOR. 

GN GLP-1 OR EMB-33 OR F02A9.6. 

OS CAENORHABDITIS ELEGANS. 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENTEA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 



RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 89336787. 

RA YOCHEM J., GREENWALD I.; 

RT "glp-1 and lin-12, genes implicated in distinct cell-cell 

RT interactions in C. elegans, encode similar transmembrane proteins."; 

RL CELL 58:553-563(1989). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K., BAYNES C, BERKS M, ( 

RA BONFIELD J., BURTON J,, CONNELL M., COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M. ( DEAR S., DU Z., DURBIN R., FAVELLO A, , FRASER A., 

RA FULTON L., GARDNER A., GREEN P., HAWKINS T., HILLIER L., JIER M., 

RA JOHNSTON L., JONES M., KERSHAW J., KIRSTEN J., LAISSTER N. , 

RA LATREILLE P . , LIGHTNING J . , LLOYD C , , MORTIMORE B . , O ' CALLAGHAN M . , 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A,, SAUNDERS D,, SHOWNKEEN R., 

RA SIMS M., SMALDON N. ( SMITH A, , SMITH M., SONNHAMMER E. , STADEN R., 

RA SULSTON J., THIERRY -MIEG J., THOMAS l„ VAUDIN M., VAUGHAN K., 

RA WATERSON R,, WATSON A., WEINSTOCK L,, WILKINSON-SPROAT J., 

RA WOHLDMAN P.; 

RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 

RT elegans."; 

RL NATURE 368:32-38(1994). 

RN [3] 

RP DELETION OF 1174-1295, 

RX MEDLINE; 91351288. 

RA MANGO S.E., MAINE E.M., KIMBLE J.; 

RT "Carboxyterminal truncation activates glp-1 protein to specify 

RT vulval fates in Caenorhabditis elegans."; 

RL NATURE 352:811-815(1991). 

RN [4] 

RP CHARACTERIZATION OF FUNCTION OF THE ANK- REPEATS. 

RX MEDLINE; 93354444, 

RA ROEHL H., KIMBLE J.; 

RT "Control of cell fate in C. elegans by a GLP-1 peptide consisting 

RT primarily of ankyrin repeats,"; 

RL NATURE 364:632-635(1993). 

RN [5] 

RP FUNCTION. 

RX MEDLINE; 94208066. 

RA MELLO C.C., DRAPER B.W., PRIESS J.R.; 

RT "The maternal genes apx-1 and glp-1 and establishment of 

RT dorsal -ventral polarity in the early C. elegans embryo."; 

RL CELL 77:95-106(1994). 

CC -!■ FUNCTION: INVOLVED IN THE SPECIFICATION OF THE CELL FATES OF THE 
CC THE BLASTOMERES, ABA AND APA. PROPER SIGNALLING BY GLP-1 INDUCES 
CC ABA DESCENDANTS TO PRODUCE ANTERIOR PHARYNGEAL CELLS, AND APA 
CC DESCENDANTS TO ADOPT A DIFFERENT FATE. CONTRIBUTES TO THE 
CC ESTABLISHMENT THE DORSAL -VENTRAL AXIS IN EARLY EMBRYOS. 

CC -!- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN. 

CC -I- DEVELOPMENTAL STAGE: ACTS ON ABP DEVELOPMENT DURING 4 -CELL AND 
CC 12-CELL STAGES, AND ON ABA DEVELOPMENT DURING 12-CELL AND 28-CELL 
CC STAGES. 

CC -!• SIMILARITY: HIGH, TO C. ELEGANS LIN-12. 

CC -!- SIMILARITY: CONTAINS 10 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: CONTAINS 3 LIN/NOTCH REPEATS. 

CC -I- SIMILARITY: CONTAINS 6 ANK REPEATS. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed, Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib,ch). 

CC 

DR EMBL; M25580; G156317; -. 

DR EMBL; Z19555; E1322024; -. 

DR EMBL; Z29116; E1322024; JOINED. 

DR EMBL; 229116; E1323609; -. 
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DR EMBL; Z19555; E1323609; JOINED. 

DR PIR; A32901; A32901. 

DR WORMPEP; F02A9.6; CE00237. 

DR PROSITE; PS00010; ASXJYDROXYL; 2. 

DR PROSITE; PS00022; EGF_1; 10. 

DR PROSITE; PS01186; EGF_2 ; 8. 

DR PROSITE; PS01187; EGF.CA; 1. 

DR PFAM; PF00008; EGF; 10. 

DR PFAM; PF00023; ank; 4. 

DR PFAM; PF00066; notch; 3. 

DR HSSP; P00740; 1IXA. 

KW DIFFERENTIATION; REPEAT; ANK REPEAT; EGF-LIKE DOMAIN; TRANSMEMBRANE; 

KW GLYCOPROTEIN; SIGNAL, 



FT 


SIGNAL 


1 


15 


POTENTIAL, 


FT 


CHAIN 


16 


1295 


GLP-1 PROTEIN. 


FT 


DOMAIN 


16 


764 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


765 


786 


POTENTIAL, 


FT 


DOMAIN 


787 


1295 


CYTOPLASMIC (POTENTIAL), 


FT 


DOMAIN 


493 


607 
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SEQUENCE 


1295 AA; 144078 MW; B468B73D CRC32; 



Query Match 12.5%; Score 239; DB 1; Length 1295; 

Best Local Similarity 31.54; Pred. No. 2.03e-34; 

Matches 40; Conservative 28; Mismatches 50; Indels 9; Gaps 9; 

Db 112 T PLFSGVNPCDSDPCNNGLC YPFY - GGFQC ICNNG YGGS YCEE - G IDHCAQNECAEGSTC 169 

I -I HI I :l I I :|l I |::| |: |:: I I I I I II 
Qy 41 TG ILPGCEPCHKKVCAHGCCQPS SQSG FTCECEEGWMGP LCDQRT NDPCLG NKCVHG - TC 99 

Db 170 VN-SVYNYYCDCPIGKSGRYCERTECALMG-N-I-CNHGRCIPNRDEDKNFRCVCDSGYE 225 

: : ::| I I I :| I: I : | |:||:| : : | 
Qy 100 LPINAFSYSCKCLEGHGGVICDEEEDLFNPCQMIKCKHGKCRLS-GVGQPY-CECNSGFT 157 

Db 226 GEFCNKD 232 

I: I": 
Qy 158 GDSCDRE 164 



RESULT 14 

ID DL.DROME STANDARD; PRT; 880 AA. 

AC P10041; 

DT 01-MAR-1989 (REL, 10, CREATED) 

DT 01-MAR-1989 (REL, 10, LAST SEQUENCE UPDATE) 

DT 01-NOV-1997 (REL, 35, LAST ANNOTATION UPDATE) 

DE NEUROGENIC LOCUS DELTA PROTEIN PRECURSOR. 

GN DL. 

OS DROSOPHILA MELANOGASTER (FRUIT FLY), 

OC EUKARYOTA; METAZOA; ARTHROPODA; TRACHEATA; HEXAPODA; INSECTA; 

OC PTERYGOTA; DIPTERA; BRACHYCERA; MUSCOMORPHA; EPHYDROIDEA; 

OC DROSOPHILIDAE; DROSOPHILA, 

RN [1] 

RP SEQUENCE FROM N.A, 

RA VAESSIN H., BREMER K.A., KNUST E., CAMPOS -ORTEGA J. A.; 

RT "The neurogenic gene Delta of Drosophila melanogaster is expressed in 

RT neurogenic territories and encodes a putative transmembrane protein 

RT with EGF-like repeats."; 

RL EMBO J. 6:3431-3440(1987). 

RN [2] 

RP SEQUENCE OF 422-621 FROM N.A. 

RX MEDLINE; 87218537. 

RA KNUST E., DIETRICH U., TEPASS O., BREMER K.A., WEIGEL D., VAESSIN H,, 

RA CAMPOS -ORTEGA J. A.; 

RT "EGF homologous sequences encoded in the genome of Drosophila 

RT melanogaster, and their relation to neurogenic genes,"; 

RL EMBO J. 6:761-766(1987). 

RN [3] 

RP PATTERN OF TRANSCRIPTION. 

RX MEDLINE; 91209246. 

RA HAENLIN M., KRAMATSCHEK B., CAMPOS-ORTEGA J. A.; 

RT "The pattern of transcription of the neurogenic gene Delta of 

RT Drosophila melanogaster."; 

RL DEVELOPMENT 110:905-914(1990). 

CC -!- FUNCTION: ESSENTIAL FOR PROPER DIFFERENTIATION OF ECTODERM. DL 
CC. IS REQUIRED FOR THE CORRECT SEPARATION OF NEURAL AND EPIDERMAL 
CC CELL LINEAGES. 

CC -I- SUBCELLULAR LOCATION: TYPE I MEMBRANE PROTEIN, 

CC •!• SEPARATION OF NEUROBLASTS FROM THE ECTODERM INTO THE INNER PART 

CC OF EMBRYO IS ONE OF THE FIRST STEPS OF CNS DEVELOPMENT IN INSECTS, 

CC THIS PROCESS IS UNDER CONTROL OF THE NEUROGENIC GENES. 

CC -!- NOTCH AND SERRATE MAY INTERACT AT THE PROTEIN LEVEL, IT IS 

CC CONCEIVABLE THAT THE SERRATE AND DELTA PROTEINS MAY COMPETE 

CC FOR BINDING WITH THE NOTCH PROTEIN. 

CC -I" SIMILARITY; CONTAINS 9 EGF-LIKE DOMAINS. 

CC -!- SIMILARITY: TO DROSOPHILA SERRATE PROTEIN. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://vwv.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

cc 
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EMBL; X06289; G7853; -. 
EMBL; X05140; G929563; -. 
PIR; S00670; S00670. 
PIR; A26637; A26637. 
FLYBASE; FBgn0000463 ; Dl. 
PROSITE; PS0001Q; ASXJYDROXYL; 3. 
PROSITE; PS00022; EGFJL; 9. 
PROSITE; PS01186; EGF_2; 9. 
PROSITE; PS01187; EGF.CA; 2. 
PFAM; PF00008; EGF; 8. 
HSSP; P00740; 1IXA, 

DIFFERENTIATION; NEUROGENESIS; REPEAT; TRANSMEMBRANE; 
EGF-LIKE DOMAIN; GLYCOPROTEIN; SIGNAL, 
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SQ SEQUENCE 880 AA; 94643 MW; E967E662 CRC32; 

Query Match 12.3%; Score 234; DB 1; Length 880; 

Best Local Similarity 33,34; Pred. No, 3.45e-33; 

Matches 53; Conservative 30; Mismatches 60; Indels 16; Gaps 12; 

Db 226 CHIPKCAKGCEHGHCDKPNQCVCQLGWKGALCNECVLEP- - -N-CIHGTC- • -NK-PWTC 277 

II llll: : I I: Ihhll:: -:| I Mill I : :| 
Qy 50 CHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFSYSC 109 

Db 278 ICNEGWGGLYCNQDLN-YCTNHR-PCKNGGTCFNTGEGL-YTCKCAPGYSGDDCENEIYS 334 

I II II: I::: : : : l|:| I =11 III :|::|| |: || I 



110 KCLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPY-CECNSGFTGDSCDREI-S 166 

335 CDAD - VNPC ■ QNGGTC IDEPHTKTG YKCHCRNGWSGKMC 371 

I :: : I: II : II I :| I 

167 CRGERIRDYYQKQQGYAACQTTKKVSRLECRGGCAGGQC 205 



STANDARD; PRT; 1064 AA. 



RESULT 15 
ID FBP1 STRPU 
AC P10079; 

DT 01-MAR-1989 (REL. 10, CREATED) 
DT 01-FEB-1996 (REL. 33, LAST SEQUENCE UPDATE) 
01-NOV-1997 (REL. 35, LAST ANNOTATION UPDATE) 
FIBROPELLIN I PRECURSOR (EPIDERMAL GROWTH FACTOR -RELATED PROTEIN 1) 
(UEGF-1). 
EGF1. 

STRONGYLOCENTROTUS PURPURATUS (PURPLE SEA URCHIN). 
EUKARYOTA; METAZOA; ECHINODERMATA; ECHINOZOA; ECHINOIDEA; 
EUECHINOIDEA; ECHINACEA; ECHINOIDA; STRONGYLOCENTROT IDAE ; 
STRONGYLOCENTROTUS. 
[1] 

SEQUENCE FROM N.A. 
MEDLINE; 90112459. 

DELGADILLO-REYNOSO M.G., ROLLO D.R., HURSH D.A., RAFF R.A.; 
"Structural analysis of the uEGF gene in the sea urchin 
strongylocentrotus purpuratus reveals more similarity to vertebrate 
than to invertebrate genes with EGF- like repeats,"; 
J, MOL, EVOL, 29:314-327(1989). 
[2] 

SEQUENCE OF 279-476 AND 781-1064 FROM N.A. 
MEDLINE; 87319677. 

HURSH D.A., ANDREWS M.E., RAFF R.A.; 
"A sea urchin gene encodes a polypeptide homologous to epidermal 
growth factor."; 
SCIENCE 237:1487-1490(1987). 
[3] 

AVIDIN-LIKE DOMAIN , 
MEDLINE; 89196806. 
HUNT L.T., BARKER W.C.; 

"Avidin-like domain in an epidermal growth factor homolog from a sea 
urchin."; 

FASEB J. 3:1760-1764(1989). 
[4] 

CHARACTERIZATION. 
MEDLINE; 91285254. 

BISGROVE B.W., ANDREWS M.E., RAFF R.A.; 

"Fibropellins, products of an EGF repeat-containing gene, form a 
unique extracellular matrix structure that surrounds the sea urchin 
embryo."; 

DEV. BIOL. 146:89-99(1991). 

-!- FUNCTION: FORM THE APICAL LAMINA, A COMPONENT OF THE EXTRACELLULAR 
MATRIX, 

-!- SUBCELLULAR LOCATION: EXTRACELLULAR. IN VESICLES IN THE CYTOPLASM 
OF UNFERTILIZED EGGS, THEN TO THE BASE OF THE HYALIN LAYER 
THROUGHOUT DEVELOPMENT AND FINALLY IN THE APICAL LAMINA IN LATE 
EMBRYOS AND EARLY LARVAE. 
-!- DEVELOPMENTAL STAGE: MODERATE LEVELS IN UNFERTILIZED EGGS AND 
DURING EARLY CLEAVAGE, THEN RAPIDLY INCREASES IN ABUNDANCE BETWEEN 
, LATE MORULA AND MESENCHYME BLASTULA STAGES TO MAXIMAL LEVELS 
MAINTAINED THROUGH SUBSEQUENT STAGES. EXPRESSED BOTH MATERNALLY 
AND ZYGOTICALLY. 

-!• ALTERNATIVE PRODUCTS: TWO FORMS (IA AND IB) ARE PRODUCED BY 

ALTERNATIVE SPLICING. THE SMALL FORM (IB) LACKS 8 EGF REPEATS. 
-!- SIMILARITY: CONTAINS 21 EGF-LIKE DOMAINS. 
-!- SIMILARITY: CONTAINS 1 CUB DOMAIN. 

-!- SIMILARITY: THE C -TERMINAL DOMAIN OF THIS PROTEIN IS SIMILAR 
TO AVIDIN/STREPTAVIDIN, 



This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
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CC entities requires a license agreement (See http://www.isb-sib,ch/announce/ 

CC or send an email to licenseGisb-sib.ch). 

CC 

DR EMBL; L08692; G161467; -. 

DR EMBL; L08692; G161466; -. 

DR EMBL; X17530; G667061; -. 

DR EMBL; M17421; G552260; -. 

DR EMBL; X17533; G667062; -. 

DR PIR; A29316; A29316. 

DR PROSITE; PS00010; ASXJYDROXYL; 19. 

DR PROSITE; PS00022; EGF 1; 19. 

DR PROSITE; PSQ0577; AVIDIN; 1. 

DR PROSITE; PS01180; CUB; 1, 

DR PROSITE; PS01186; EGF 2; 19. 

DR PROSITE; PS01187; EGF_CA; 19. 

DR PFAM; PF00008; EGF; 21. 

DR PFAM; PF00431; CUB; 1. 

DR HSSP; P01132; 1EPH, 
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TMOrTT PTr\ 


810 


819 


BY 


SIMILARITY, 


FT 


TMCTTT UTA 

UlbUbrlU 




837 


BY 


SIMILARITY, 


FT 


DISULFID 


831 


846 


BY 


SIMILARITY. 


FT 


DISULFID 


848 


857 


BY 


SIMILARITY. 


FT 


DISULFID 


864 


875 


BY 


SIMILARITY. 


FT 


DISULFID 


869 


884 


BY 


SIMILARITY. 




DISULFID 


886 


895 


BY 


SIMILARITY. 


FT 


DISULFID 


902 


913 


BY 


SIMILARITY. 


FT 


DISULFID 


907 


922 


BY 


SIMILARITY. 


FT 


DISULFID 


924 


933 


BY 


SIMILARITY. 


FT 


VARSPLIC 


477 


780 


MISSING (IN FORM IB) . 


FT 


CARBOHYD 


30 


30 


POTENTIAL, 


FT 


CARBOHYD 


136 


136 


POTENTIAL. 


FT 


CARBOHYD 


851 


851 


POTENTIAL. 


FT 


CONFLICT 


279 


279 


L 


> S (IN REF. 2). 


SQ 


SEQUENCE 


1064 


AA; 112072 MW; 


FBD10D48 CRC32; 



Query Match 12.1%; Score 230; DB 1; Length 1064; 

Best Local Similarity 28.34; Pred, No. 3.30e-32; 

Matches 36; Conservative 29; Mismatches 51; Indels 11; Gaps 10; 

Db 710 DECASAPCQNGGVC-VDGVNGYVCNCAPGYTGDNCETEIDE-CASMPCLNGGACIEMVN- 766 

: I I :| I : :|: hi | | |: :: | : |;:| :|: : 
Qy 48 EPCHKKVCAHG-CCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAF 105 

Db 767 GYTCQCVAGYTGVIC-ETDI-DECASAPCQNGGVCTDTIN-GYICACVPGFTGSNCETN 822 

:hl h I M: I h : I I :| : I I :|||| :|: 
Qy 106 SYSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHGKCRLSGVGQPY-CECNSGFTGDSCDRE 164 

Db 823 IDECASD 829 

I I ::. 
Qy 165 IS-CRGE 170 



Search completed: Fri May 28 09:41:43 1999 
Job time : 21 sees. 
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Release 3.1A John F, Collins, Biocomputing Research Unit, 
Copyright (c) 1993-1998 University of Edinburgh, U.K. 
Distribution rights by Oxford Molecular Ltd 



^^^rch_pp protein • protein database search, using Smith-Waterman algorithm 
Run on; Fri May 28 09 : 4 2 : C 

Tabular output not generated. 



! 1999; MasPar time 17.34 Seconds 

764.994 Million cell updates/sec 



Title; >US-09-191-647-14 
Description; (1-243) from US09191647 .pep 



Perfect Score: 
Sequence; 

Scoring table: 



1905 

1 ILDVASLRQAPGENGTSFHG. . 



..SSFVDEVEKWKCGCARCAS 243 



PAM 150 
Gap 11 



179066 seqs, 54579741 residues 



Post-processing: Minimum Match 0* 

Listing first 45 summaries 



sptrembl9 

l:sp_archea 2;sp_bacteria 3:sp_fungi 4:sp_human 
5 :sp_in vertebrate 6:spjnammal 7:sp_mhc 8;sp_organelle 
9:sp_phage 10;sp_plant ll:sp_rodent 12;sp_unclassified 
13:sp_vertebrate 14:sp_virus 

istics: Mean 40.603; Variance 58.453; scale 0,695 

Pred. No, is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



Result 
• No 



Query 



). Score 


Match Length 


DB 


ID 


Description 


Pred. No, 


1 1060 


55.6 


1531 


11 


088279 


MEGF4. 


l,45e-265 


2 990 


52.0 


739 


4 


075094 


MEGF5 (FRAGMENT). 


2.96e-245 


3 988 


51.9 


1523 


11 


088280 


MEGF5. 


1.12e-244 


4 405 


21.3 


79 


4 


075093 


MEGF4 (FRAGMENT). 


2.61e-79 


5 294 


15.4 


2447 


13 


013149 


NOTCH 2 (FRAGMENT) . 


1.37e-49 


6 290 


15.2 


1203 


11 


Q06008 


NOTCH PROTEIN HOMOLOG 


1.52e-48 


7 290 


15.2 


2470 


11 


035516 


CELL SURFACE PROTEIN. 


1.52e-48 


8 270 


14.2 


1722 


5 


Q19350 


SIMILAR TO EGF-LIKE RE 


2.38e-43 


9 269 


14.1 


1219 


11 


Q63722 


JAGGED PROTEIN, 


4,30e-43 


10 260 


13,6 


752 


13 


042374 


NOTCH RECEPTOR PROTEIN 


8.90e-41 


LI 258 


13.5 


1218 


4 


014902 


TRANSMEMBRANE PROTEIN 


2.90e-40 


L2 258 


13.5 


1218 


4 


Q15816 


TRANSMEMBRANE PROTEIN 


2.90e-40 


L3 258 


13.5 


1218 


4 


015122 


JAGGED1. 


2.90e-40 


L4 258 


13.5 


1227 


4 


P78504 


JAGGED 1 (TRANSMEMBRAN 


2.90e-40 


5 256 


13.4 


1193 


13 


Q90819 


C-SERATE-1 PROTEIN (FR 


9.42e-40 


L6 256 


13.4 


1212 


13 


042347 


C- SERRATE -2 (FRAGMENT) 


9,42e-40 


17 254 


13.3 


530 


5 


Q24526 


SLIT LOCUS ENCODING A 


3.06e-39 


L8 252 


13.2 


2653 


5 


025253 


NOTCH HOMOLOG SCALLOPE 


9.91e-39 


L9 250 


13.1 


1687 


11 


Q61204 


NOTCH2-LIKE (EGF REPEA 


3.21e-38 


20 244 


12.8 


2352 


5 


061240 


HRNOTCH PROTEIN. 


1.07e-36 



21 


243 


12.8 


2531 


5 


016004 


NOTCH HOMOLOG. 


1.92e 


36 


22 


240 


12,6 


406 


5 


Q25059 


FIBROPELLIN III (FRAGM 


l.lle 


35 


23 


240 


12,6 


1964 


11 


035442 


NOTCH4, 


l.lle 


35 


24 


237 


12.4 


601 


5 


Q20204 


F40E1Q.4 PROTEIN (FRAG 


6.33e 


35 


25 


236 


12,4 


1095 


4 


Q99458 


NOTCH4 (FRAGMENT). 


1.13e 


34 


26 


236 


12.4 


1999 


4 


Q99940 


N0TCH4, 


1.13e 


34 


27 


236 


12,4 


2003 


4 


000306 


NOTCH4. 


1.13e 


34 


28 


234 


12.3 


473 


5 


025464 


ADHESIVE PLAQUE MATRIX 


3.60e 


34 


29 


233 


12.2 


832 


5 


Q99108 


NEUROGENIC LOCUS DELTA 


6.43e 


34 


30 


233 


12,2 


4590 


4 


Ql'4517 


CADHERIN-RELATED TUMOR 


6,43e 


34 


31 


231 


12.1 


434 


11 


055139 


JAGGED2 PROTEIN (FRAGM 


2.04e 


33 


32 


231 


12,1 


518 


11 


070219 


JAGGED 2 (JAGGED 2 PRO 


2.04e 


33 


33 


230 


12.1 


529 


5 


Q25058 


FIBROPELLIN IA (FRAGME 


3.64e 


33 


34 


230 


12.1 


661 


5 


061537 


SPERM TRANSMEMBRANE PR 


3.64e 


33 


35 


228 


12.0 


955 


4 


Q99466 


NOTCH4 (FRAGMENT), 


1.15e 


32 


36 


226 


11,9 


1202 


11 


P97607 


JAGGED2 (FRAGMENT). 


3.65e 


32 


37 


227 


11,9 


1476 


13 


Q90285 


PUTATIVE EXTRACELLULAR 


2.05e 


32 


38 


223 


11.7 


762 


13 


042373 


NOTCH RECEPTOR PROTEIN 


2.04e 


31 


39 


219 


11,5 


721 


13 


Q91902 


X-DELTA-1. 


2.02e 


30 


40 


219 


11.5 


3871 


5 


Q20911 


ZC116.3 PROTEIN. 


2,02e 


30 


41 


218 


11,4 


717 


13 


P87357 


DELTAD TRANSMEMBRANE P 


3.57e 


30 


42 


216 


11.3 


642 


13 


P79941 


NOTCH LIGAND X-DELTA-2 


l,12e 


29 


43 


216 


11.3 


728 


13 


Q90656 


TRANSMEMBRANE PROTEIN 


1.12e 


29 


44 


216 


11.3 


802 


13 


057462 


DELTAA. 


l,12e 


29 


45 


214 


11.2 


263 


4 


Q99740 


SOLUBLE PROTEIN JAGGED 


3.49e 


29 



088279 PRELIMINARY; PRT; 1531 AA. 
088279; 

01-MOV-1998 (TREMBLREL. 08, CREATED) 

01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

MEGF4. 

MEGF4. 

RATTUS NORVEGICUS (RAT). 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

[1] 

SEQUENCE FROM N.A. 

STRAIN-SPRAGUE-DAWLEY; TISSUE-BRAIN; 
MEDLINE; 98360089. 

NAKAYAMA M,, NAKAJIMA D., NAGASE T., NOMURA N, , SEKI N., OHARA 0.; 

"Identification of high-molecular -weight proteins with multiple 

EGF-like motifs by motif-trap screening,"; 

GENOMICS 51:27-34(1998), 

EMBL; AB011530; D1033423; -. 

PROSITE; PS01185; CTCK.1; 1, 

PROSITE; PS01186; EGF_2; 8. 

PROSITE; PS01187; EGF.CA; 2. 

GLYCOPROTEIN; EGF-LIKE DOMAIN. 

1531 AA; 167497 MW; 5C5EBDF4 CRC32; 



Query Match 55.6%; Score 1060; DB 11; Length 1531; 

Best Local Similarity 54,6%; Pred. No. 1.45e-265; 



Matches 


Db 


1303 


Qy 


14 


Db 


1363 


Qy 


74 


Db 


1423 


Qy 


134 


Db 


1483 



0; Gaps 



IMMIIMIIMMMIIM I I |::IIMI!:I I II III: 



II I III : II 1:11111 l:|::|::lll I :l :| II 



I II I: I: : I |::||:|: |::| III: 



III Mil: 



:|ll|:|:| II II Mil MIIMIMI Mill Mill II 
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Qy 194 LECRGGCAGGQCCGPLRSKRRKYSFECTDGSSFVDEVEKWKCGCARCA 242 



RESULT 2 

ID 075094 PRELIMINARY; PRT; 739 AA. 

AC 075094; 

DT 01-NOV-1998 (TREMBLREL, 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF5 (FRAGMENT), 

GN MEGF5 . 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A, 

RC TISSUE-BRAIN; 

RX MEDLINE; 98360089, 

RA NAKAYAMA M., NAKAJIMA D. ( NAGASE T., NOMURA N., SEKI N., OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif -trap screening."; 

•GENOMICS 51:27-34(1998). 
EMBL; AB0U538; D1033429; -. 
PROSITE; PS01185; CTCKJ; 1. 
DR PROSITE; PS01186; EGF 2; 7. 
DR PROSITE; PS01187; EGF_CA; 2. 
KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 
FT NON.TER 1 1 
SQ SEQUENCE 739 AA; 80364 MW; DC6BCB63 CRC32; 

Query Match 52,0%; Score 990; DB 4; Length 739; 

Best Local Similarity 51.3%; Pred. No. 2.96e-245; 

Matches 123; Conservative 50; Mismatches 62; Indels 5; Gaps 4; 

Db 504 LSALRQGTDRPLGGFHGCIHEVRINNELQDFKALPPQSLGVSPGCKSC--TVCKHGLCRS 561 

:::IH: HIIII::: Ihllllh :| I: h III :| II II |:: 
Qy 4 VASLRQAPGENGTSFHGCIRNLYINSELQDFRKMPMQT-GILPGCEPCHKKVCAHGCCQP 62 

Db 562 VEKDSWCECRPGWTGPLCDQEARDPCLGHRCHHGKCVATGT - SYMCKCAEGYGGDLCDN 620 

: III II HUM : llll|::| II |:: : II III I I III: 
Qy 63 SSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFSYSCKCLEGHGGVLCDE 122 

Db 621 KNDSANACSAFKCHHGQCHISDQGEPYCLCQPGFSGEHCQQENPCLGQWREVIRRQKGY 680 

:| hi :H II |::| hill I :||:|: h:| :l h :|: ::| II 
Qy 123 EEDLFNPCQMIKCKHGKCRLSGVGQPYCECNSGFTGDSCDREISCRGERIRDYYQRQQGY 182 

Db 681 ASCATASKVPIMECRGGC-GPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS 739 

hi h lh :|lllll I III I lllllll hlllllllhllh : III h 

Qy 183 AACQTTKKVSRLECRGGCAGGQCCGPLRSKRRKYSFECTDGSSFVDEVEKWKCGCARCA 242 

fLT 3 
088280 PRELIMINARY; PRT; 1523 AA, 

AC 088280; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE MEGF5, 

GN MEGF5, 

OS RATTUS NORVEGICUS (RAT). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; RATTUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-SPRAGUE- DAWLEY ; TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M. ( NAKAJIMA D., NAGASE T,, NOMURA N, ( SEKI N., OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif -trap screening/; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011531; D1033424; -. 

DR PROSITE; PS01185; CTCK 1; 1. 

DR PROSITE; PS01186; EGF 2; 7. 



DR PROSITE; PS01187; EGF.CA; 2. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN, 

SQ SEQUENCE 1523 AA; 167767 MW; 2BD845D0 CRC32; 

Query Match 51,9%; Score 988; DB 11; Length 1523; 

Best Local Similarity 50.81; Pred. No. l,12e-244; 

Matches 122; Conservative 50; Mismatches 63; indels 5; Gaps 4; 

Db 1288 LSALRQGADRPLGGFHGCIHEVRINNELQDFKALPPQSLGVSPGCKSC-TVCRHGLCRS 1345 

:::Hh: :|||||::: Ihllllh :| |: h III :| II II |:: 
Qy 4 VASLRQAPGENGTSFHGCIRNLYINSELQDFRKMPMQT-GILPGCEPCHKKVCAHGCCQP 62 

Db 1346 VEKDSWCECHPGWTGPLCDQEAQDPCLGHSCSHGTCVATGN-SYVCKCAEGYEGPLCDQ 1404 

: III II HUM : llllh I I J 1 1 r : I! !M I | III: 
Qy 63 SSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHGTCLPINAFSYSCKCLEGHGGVLCDE 122 

Db 1405 KNDSANACSAFKCHHGQCHISDRGEPYCLCQPGFSGNHCEQENPCLGEIVREAIRRQKDY 1464 

:| hi :ll II h:| hill I :||:|: |::| :| II :|: ::| I 
Qy 123 EEDLFNPCQMIKCKHGKCRLSGVGQPYCECNSGFTGDSCDREISCRGERIRDYYQKQQGY 182 

Db 1465 ASCATASKVPIMVCRGGC-GSQCCQPIRSKRRKYVFQCTDGSSFVEEVERHLECGCRECS 1523 

hi h lh : Mill hill hlllllll hlllllllhllh : III h 

Qy 183 AACQTTKKVSRLECRGGCAGGQCCGPLRSKRRKYSFECTDGSSFVDEVEKWKCGCARCA 242 



RESULT 4 

ID 075093 PRELIMINARY; PRT; 79 AA. 

AC 075093; 

DT 01-NOV-1998 (TREMBLREL. 08, CREATED) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE MEGF4 (FRAGMENT) . 

GN MEGF4. 

OS HOMO SAPIENS (HUMAN). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE-BRAIN; 

RX MEDLINE; 98360089. 

RA NAKAYAMA M. , NAKAJIMA D. ( NAGASE T., NOMURA N., SEKI N. , OHARA O.; 

RT "Identification of high-molecular-weight proteins with multiple 

RT EGF-like motifs by motif-trap screening,"; 

RL GENOMICS 51:27-34(1998). 

DR EMBL; AB011537; D1033428; -. 

DR PROSITE; PS0U85; CTCK 1; 1. 

FT NONJER 1 1 

SQ SEQUENCE 79 AA; 8809 MW; 96C95FFE CRC32; 

Query Match 21.3%; Score 405; DB 4; Length 79; 

Best Local Similarity 60.8%; Pred. No. 2.61e-79; 

Matches 48; Conservative 15; Mismatches 16; Indels 0; Gaps 0; 

Db 1 ESECRGDPVRDFHQVQRGYAICQTTRPLSWVECRGSCPGQGCCQGLRLKRRKFTFECSDG 60 

I llh :lh I hill Mil: : I : = 1 1 1 1 : 1 : 1 II II lllh:|||:|| 
Qy 164 EISCRGERIRDYYQKQQGYAACQTTKKVSRLECRGGCAGGQCCGPLRSKRRKYSFECTDG 223 

Db 61 TSFAEEVEKPTKCGCALCA 79 

:ii mil urn ii 

Qy 224 SSFVDEVEKWKCGCARCA 242 



RESULT 5 

ID 013149 PRELIMINARY; PRT; 2447 AA, 

AC 013149; 

DT 01-JUL-1997 (TREMBLREL. 04, CREATED) 

DT 01-JUL-1997 (TREMBLREL, 04, LAST 'SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL, 08, LAST ANNOTATION UPDATE) 

DE NOTCH 2 (FRAGMENT) , 

OS FUGU RUBRIPES (JAPANESE PUFFERFISH). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; ACANTHOPTERYGII; PERCOMORPHA; 

OC TETRAODONTIFORMES; TETRAODONTOIDEI; TETRAODONTIDAE; FUGU, 
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RN [1] 

RP SEQUENCE FROM N.A. 

RA NAKAMJRA T., TROWSDALE J.; 

RL SUBMITTED (JUN-1997) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; AB004829; D1021371; -. 

DR PROSITE; PS00010; ASX_HYDROXYL; 22. 

DR PROSITE; PS01186; EGF 2; 29. 

DR PROSITE; PSQ1187; EGF.CA; 20. 

DR PFAM; PF00008; EGF; 35. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 3, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NONJER 1 1 

SQ SEQUENCE 2447 AA; 262542 MW; 3CDA4F7A CRC32; 

•Query Match 15.4%; Score 294; DB 13; Length 2447; 

Best Local Similarity 41.9*; Pred. No, 1.37e-49; 
Matches 49; Conservative 22; Mismatches 35; Indels 11; Gaps 11; 

Db 381 CEHGGQC-VNTEGSFTCNCAKGYAGPRCEQDVNE-CASNPCQNDGTCLDRIG-DYSCICM 437 

I II I :::::lll:l I II hi I: I :| I : llll : III |: 
Qy 55 CAHGC-CQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVH-GTCLPINAFSYSCKCL 112 

Db 438 PGFGGTHC-ENE-L-NECLSSPCLNRGKC-LDQVSRFVCECPAGFSGEMCQIDIDEC 490 

I II I 1:1 I I I I ::||| I |:: III :||:|: I: :l I 
Qy 113 EGHGGVLCDEEEDLFNPCQMIKC - KHGKCRLSGVGQPYCECNSGFTGDSCDREIS - C 167 



RESULT 6 

ID Q06008 PRELIMINARY; PRT; 1203 AA. 

AC Q06008; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH PROTEIN HOMOLOG 2 (MOTCH B PROTEIN) (FRAGMENT). 

GN NOTCH2 OR MOTCH B. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-Fl (CBA X C57BL); TISSUE-WHOLE EMBRYO; 

RX MEDLINE; 93178563. 

RA LARDELLI M., LENDAHL U.; 

m "Motch A and motch B--two mouse Notch horaologues coexpressed in a 

■ wide variety of tissues . " ; 

Wl EXP. CELL RES. 204:364-372(1993). 

DR EMBL; X68279; G287990; *. 

DR MGD; MGI: 97364; NOTCH2. 

DR PFAM; PF00008; EGF; 27. 

DR PFAM; PF00066; notch; 1. 

KW DIFFERENTIATION; NEUROGENESIS; REPEAT. 

FT NONJER 1 1 

FT NONJER 1203 1203 

SQ SEQUENCE 1203 AA; 128982 MW; A5A95551 CRC32; 

Query Match 15.24; Score 290; DB 11; Length 1203; 

Best Local Similarity 33.3%; Pred. No. 1.52e-48; 

Matches 43; Conservative 34; Mismatches 41; Indels 11; Gaps 10; 

Db 255 DNCDPDPCHHGQCQDGIDS-YTCICNPGYMGAICSDQIDE-CYSSPCLNDGRCIDLVN-G 311 

: I I II II : :| :|| h I ||::| :: :: I :: |:: | |: : ; 
Qy 48 EPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVH-GTCLPINAFS 106 

Db 312 YQCNCQPGTSGLNC--EIN-FDDCASNPCMHGVC-VDGINR-YSCVCSPGFTGQRCNIDI 366 

I 1:1 I :|: I I : |: I |:|| I : |: : I I |::||||: |: :| 
Qy 107 YSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHGKCRLSGVGQPY-CECNSGFTGDSCDREI 165 

Db 367 DECASNPCR 375 

I :: I 
Qy 166 S-CRGERIR 173 



RESULT 7 

ID 035516 PRELIMINARY; PRT; 2470 AA. 

AC 035516; 

DT 01-JMH998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE CELL SURFACE PROTEIN. 

GN N0TCH2. 

OS MUS MUSCULUS (MOUSE). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENTIA; 

OC SCIUROGNATHI; MURIDAE; MURINAE; MUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57B/6; TISSUE=THYMUS; 

RX MEDLINE; 93178563. 

RA LARDELLI M., LENDAHL U.; 

RT "Motch A and motch B-two mouse Notch homologues coexpressed in a 

RT wide variety of tissues . " ; 

RL EXP, CELL RES. 204:364-372(1993), 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-C57B/6; TISSUE=THYMUS; 

RA HAMADA Y., HIGUCHI M,, TSUJIMOTO Y.; 

RL SUBMITTED (JUL-1994) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; D32210; D1022953; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 22. 

DR PROSITE; PS01186; EGF J; 27. 

DR PROSITE; PS01187; EGF CA; 22. 

DR PFAM; PF00008; EGF; 34. 

DR PFAM; PF00023; ank; 6. 

DR PFAM; PF00066; notch; 2, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 2470 AA; 265325 MW; CA94E03A CRC32; 

Query Match 15.2%; Score 290; DB 11; Length 2470; 

Best Local Similarity 33.3%; Pred. No, 1.52e-48; 

Matches 43; Conservative 34; Mismatches 41; Indels 11; Gaps 10; 

Db 570 DNCDPDPCHHGQCQDGIDS-YTCICNPGYMGAICSDQIDE-CYSSPCLNDGRCIDLVN-G 626 

: I I II II : :| :|| |: I ||::| :: :: I :: |:: I |: : : 
Qy 48 EPCHKKVCAHGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVH-GTCLPINAFS 106 

Db 627 YQCNCQPGTSGLNC-EIN-FDDCASNPCMHGVC-VDGINR-YSCVCSPGFTGQRCNIDI 681 

I 1:1 I :|: I I : I: I |:|| I : |: : I I |::||||: |: :| 
Qy 107 YSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHGKCRLSGVGQPY-CECNSGFTGDSCDREI 165 

Db 682 DECASNPCR 690 

I :: I 
Qy 166 S-CRGERIR- 173 



RESULT 8 

ID Q19350 PRELIMINARY; PRT; 1722 AA. 

AC Q19350; 

DT 01 -NOV- 199 6 (TREMBLREL. 01, CREATED) 

DT 01 -NOV- 1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 

DT 01 -NOV- 1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE SIMILAR TO EGF-LIKE REPEATS. NCBI GI: 1125776. 

GN F11C7.4. 

OS CAENORHABDITIS ELEGANS . 

OC EUKARYOTA; METAZOA; NEMATODA; SECERNENT EA; RHABDITIA; RHABDITIDA; 

OC RHABDITINA; RHABDITOIDEA; RHABDITIDAE; PELODERINAE; CAENORHABDITIS. 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RX MEDLINE; 94150718. 

RA WILSON R., AINSCOUGH R., ANDERSON K. , BAYNES C, BERKS M. ( 

RA BONFIELD J., BURTON J., CONNELL M,, COPSEY T., COOPER J., COULSON A., 

RA CRAXTON M., DEAR S., DU Z., DURBIN R., FAVELLO A,, FULTON L. , 

RA GARDNER A., GREEN P,, HAWKINS T., HILLIER L, , JIER M, , JOHNSTON L,, 

RA JONES M,, KERSHAW J., KIRSTEN J., LAISTER N., LATREILLE P., 

RA LIGHTNING J., LLOYD C, MCMURRAY A., MORTIMORE B., O'CALLAGHAN M., 

RA PARSONS J., PERCY C, RIFKEN L., ROOPRA A., SAUNDERS D., SHOWNKEEN R,, 
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SMALDON N., SMITH A., SONNHAMMER E., STADEN R,, SULSTON J., 
THIERRY-MIEG J,, THOMAS K., VAUDIN M. , VAUGHAN K., WATERSTON R., 
WATSON A., WEINSTOCK L., WILKINSON -SPROAT J., WOHLDMAN P.; 
n 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. 



RA 
RA 
RA 
RT 
RT 

RL NATURE 368:32-38(1994) . 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN-BRISTOL N2; 

RA TAICH A. , VETTER J.; 

RL SUBMITTED (JAN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U42839; G1125776; -. 

DR PROSITE; PS00010; ASXJYDROXYL; 5. 

DR PROSITE; PS01186; EGF 2; 19. 

DR PROSITE; PS01187; EGF CA; 3. 

DR PFAM; PF00008; EGF; 24. 

KW GLYCOPROTEIN; EGF "LIKE DOMAIN. 

SQ SEQUENCE 1722 AA; 188383 MW; CCFB86B8 CRC32; 

Query Match 14.24; Score 270; DB 5; Length 1722; 

Best Local Similarity 29.1*; Pred. No. 2.38e-43; 

•latches 37; Conservative 34; Mismatches 48; Indels 8; 
1252 DECEGVECHNGGKCVKNRSEKIVCQCGNSWMGDSCNVTKTTNCKDSPCQNFGQCMQKTDT 1311 
: I hi I : : 1:1 ::||| h I : I : I |: :: 
Qy 48 EPCHKKVCAHGC-CQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVH-GTCLP-INA 104 

Db 1312 F - FECNCMDG YSGELC - EQRDV • NECNHYDCNRGHCVMT ■ VSGPACQCEMGYTGRFCEKL 1367 

I : l:|::| -\ II h |: I I |::| I :: |: I |:|: |:|| |:: 
Qy 105 FSYSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHGKCRLSGVGQPYCECNSGFTGDSCDRE 164 



Gaps 



Db 1368 LNQCSSN 1374 

;: | :: 
Qy 165 IS-CRGE 170 



f 

DR 
DR 
DR 



Q63722 PRELIMINARY; PRT; 1219 AA. 
Q63722; P70640; 

01-NOV-1996 (TREMBLREL. 01, CREATED) 
01-FEB-1997 (TREMBLREL, 02, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
JAGGED PROTEIN. 
RATTUS NORVEGICUS (RAT), 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; RODENT IA; 

SCIUROGNATHI; MURIDAE; MURINAE; RATTUS, 

[1] 

SEQUENCE FROM N.A, 
TISSUE=SCIATIC NERVE; 
MEDLINE; 95211842, 

LINDSELL C.E., SHAWBER C.J., BOULTER J., WEINMASTER G.; 

"Jagged: a mammalian ligand that activates Notchl."; 

CELL 80:909-917(1995). 

EMBL; L38483; 61492111; -. 

PROSITE; PS01186; EGF 2; 12. 

PROSITE; PS01187; EGF_CA; 8. 

PFAM; PF00008; EGF; 14. 

GLYCOPROTEIN; EGF -LIKE DOMAIN. 

SEQUENCE 1219 AA; 134325 MW; B193F948 CRC32; 



Query Match 14.14; 
Best Local Similarity 37.84; 
Matches 45; Conservative 



Score 269; DB 11; Length 1219; 
Pred. No, 4.30e-43; 

19; Mismatches 45; Indels 10; Gaps 



Db 597 NVCGPHGKCKSESGGKFTCDCNKGFTGTYCHENIND-CEGNPCTNGGTCID-GVNSYKCI 654 

:H: Ml : I : ll|:|: I I I : 1 1 I 1 1 I : | 1 1 : I 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 655 CSDGWEGAHC--ENNI-NDCSQNPCHYGGTCR-DLVNDFYCDCKNGWKGKTCHSRDSQC 709 

'MM I::: II I Ml I : l|:|::| M |: 1 
Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPYCECNSGFTGDSCD-REISC 167 



RESULT 10 

ID 042374 PRELIMINARY; PRT; 752 AA. 

AC 042374; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL, 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE NOTCH RECEPTOR PROTEIN (FRAGMENT). 

GN NOTCH6 . 

OS BRACHYDANIO RERIO (ZEBRAFISH) (ZEBRA DANIO). 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ACTINOPTERYGII; NEOPTERYGII; 

OC TELEOSTEI; EUTELEOSTEI; OSTARIOPHYSI; CYPRINIFORMES; CYPRINOIDEA; 

OC CYPRINIDAE; RASBORINAE; DANIO. 

RN [1] 

RP SEQUENCE FROM N.A. 

RA WESTIN J., LARDELLI M.; 

RL DEV. GENES EVOL. 207:51-63(1997). 

DR EMBL; Y10354; E293438; -. 

DR PROSITE; PS00010; ASX HYDROXYL; 9. 

DR PROSITE; PS01186; EGF J; 15. 

DR PROSITE; PS01187; EGF.CA; 7. 

DR PFAM; PF00008; EGF; 16. 

DR PFAM; PF00066; notch; 2. 

KW GLYCOPROTEIN; EGF -LIKE DOMAIN. 

FT NON.TER 1 1 

FT NON.TER 752 752 

SQ SEQUENCE 752 AA; 82103 MW; 72E254FB CRC32; 



Query Match 13.64; Score 260; DB 13; Length 752; 

Best Local Similarity 35.24; Pred. No. 8.90e-41; 

Matches 44; Conservative 25; Mismatches 43; Indels 13; Gaps 12; 

Db 64 PCASQPCQRRGVCQPSLDYTSYTCKCHSGWEGAQCTEDKDE-CKKSPCQNGARCVNIVG- 121 

II I : I llll : :::|| I MM: :; | : | :|: |: I : 
Qy 49 PCHKKVCAH - GCCQ PSSQ - SGFTCEC EEGWMG PLCDQRT NDPCLGNKCVHGT - CLP I NAF 105 

Db 122 SYRCECPPGYSGDNCQTHID-D-CSSNPCRNGGTC-VDKVGR-YLCECRAGFYGERCEE 176 

II I I M I: Ml I::| I : IM III :|| |: |: 
Qy 106 SYSCKCLEGHGGVLCDEEEDLFNPCQMIKCKHG - KCRLSGVGQPY - CECNSG FTGDSCDR 163 

Db 177 EVDEC 181 

I: I 

Qy 164 EIS-C 167 



014902 PRELIMINARY; PRT; 1218 AA. 
014902; 

01-JAN-1998 (TREMBLREL. 05, CREATED) 
01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
TRANSMEMBRANE PROTEIN JAGGED 1. 
HJl. 

HOMO SAPIENS (HUMAN), 

EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

CATARRHINI; HOMINIDAE; HOMO. 

[1] 

SEQUENCE FROM N.A. 
MEDLINE; 95211842. 

LINDSELL C.E., SHAWBER C.J., BOULTER J., WEINMASTER G,; 
"Jagged: a mammalian ligand that activates Notchl."; 
CELL 80:909-917(1995). 
[2] 

SEQUENCE FROM N.A. 

BASH J., ZONG W.-X., GELINAS C; 

SUBMITTED (OCM997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

EMBL; AF028593; G2599082; -. 

PROSITE; PS01186; EGF_2; 12. 

PROSITE; PS01187; EGF_CA; 8. 

PFAM; PF00008; EGF; 14. 

TRANSMEMBRANE; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SEQUENCE 1218 AA; 133797 MW; 07B97EE3 CRC32; 



Query Match 



13.54; Score 258; DB 4; Length 1218; 



Tue Jun 1 10:16:09 1999 



US-09-191-647-14.rspt 



Page 5 



Best Local Similarity 37.0%; Pred. No. 2.90e-40; 

Matches 44; Conservative 20; Mismatches 45; Indels 10; Gaps 9; 

Db 597 NVCGPHGKCKSQSGGKFTCDCNKGFTGTYCHENIND-CESNPCRNGGTCID-GVNSYKCI 654 

:M: 111:1: ll|:|: I I I : 1 1 I : I I : I 1 1 : III 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 655 CSDGWEGAYC- -ETNI -NDCSQNPCHNGGTCR-DLVNDFYCDCKNGWKGKTCHSRDSQC 709 

I :| I I I :: I I Il:|::| I :| |: I 

Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG - KCRLSGVGQPYCECNSGFTGDSCD -RE ISC 167 



RESULT 12 

ID Q15816 PRELIMINARY; PRT; 1218 AA. 

AC Q15816; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

•01 -NOV- 1996 (TREMBLREL. 01, LAST SEQUENCE UPDATE) 
01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 
TRANSMEMBRANE PROTEIN JAGGED 1. 

GN HJ1. 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATABRHIHI; HOMINIDAE; HOMO, 

RN [1] 

RP SEQUENCE FROM N.A. . 

RX MEDLINE; 95211842. 

RA LINDSELL C.E., SHAWBER C.J., BOULTER J,, WEINMASTER 6.; 

RT "Jagged: a mammalian ligand that activates Notchl . " ; 

RL CELL 80:909-917(1995). 

RN [2] 

RP SEQUENCE FROM N.A. 

RA GRAY G.E., MANN R.S., MITSIADIS E., HENRIQUE D., CARACANGIU M., 

RA ISH-HOROWICZ D., ARTAVANIS-TSAKONAS S,; 

RL SUBMITTED (JUN-1996) TO EMBL/GENBANK/DDBJ DATA BANKS, 

DR EMBL; U61276; G1438937; -. 

DR PROSITE; PS01186; EGF.2; 12. 

DR PROSITE; PS01187; EGF.CA; 8. 

DR PFAM; PF00008; EGF; 14. 

KW TRANSMEMBRANE; GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1218 AA; 133738 MW; E8D64FED CRC32; 

Query Match 13.5%; Score 258; DB 4; Length 1218; 

Best Local Similarity 37.0%; Pred, No. 2.90e-40; 

Matches 44; Conservative 20; Mismatches 45; Indels 10; Gaps 9; 

597 NVCGPHGKCRSQSGGKFTCDCNKGFTGTYCHENIND-CESNPCRNGGTCID-GVNSYKCI 654 
Wl II I : I : I M : I : I I I : 1 1 I : I I : I 1 1 : III 

ly 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db' 655 CSDGWEGAYC- -ETNI -NDCSQNPCHNGGTCR-DLVNDFYCDCKNGWKGKTCHSRDSQC 709 

I :| I I I :: I I I :| II I : lhl::| I :| |: I 
Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG - KCRLSGVGQPYCECNSGFTGDSCD • RE ISC 167 



RESULT 13 

ID 015122 PRELIMINARY; PRT; 1218 AA. 

AC 015122; 

DT 01-JAN-1998 (TREMBLREL. 05, CREATED) 

DT 01-JAN-1998 (TREMBLREL. 05, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE JAGGEDl. 

GN JAG1, 

OS HOMO SAPIENS (HUMAN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO, 

RN [1J 

RP SEQUENCE FROM N.A. 

RA ODA T. , ELKAHLOUN A.G., PIKE B.L., OKAJIMA K,, KRANTZ I.D., GENIN A., 

RA PICCOLI D.A., MELTZER P.S., SPINNER N.B., COLLINS F.S., 

RA CHANDRASEKHARAPPA S.C.; 

RL NAT, GENET. 0:0-0(1997). 

RN [2] 

RP SEQUENCE FROM N.A. 



RX MEDLINE; 97422615. 

RA ODA T., ELKAHLOUN A.G., MELTZER P.S., CHANDRASEKHARAPPA S.C.; 

RT "Identification and cloning of the human homolog (JAG1) of the rat 

RT Jaggedl gene from the Alagille syndrome critical region at 20pl2 , " ; 

RL GENOMICS 43:376-379(1997). 

DR EMBL; AF003837; G2228793; -. 

DR PROSITE; PS01186; EGF.2; 12. 

DR PROSITE; PS01187; EGF.CA; 8. 

DR PFAM; PF00008; EGF; 14. 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

SQ SEQUENCE 1218 AA; 133858 MW; 20F471DB CRC32; 

Query Match 13.5%; Score 258; DB 4; Length 1218; 

Best Local Similarity 37.0%; Pred. No. 2,90e-40; 

Matches 44; Conservative 20; Mismatches 45; Indels 10; Gaps 9; 

Db 597 NVCGPHGKCKSQSGGKFTCDCNKGFTGT YCHENIND -CESNPCRNGGTC ID - GVNSY KC I 654 

:lh II I : I : 1 1 1 : 1 : I I I : 1 1 I : I I : I 1 1 : III 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 655 CSDGWEGAYC- -ETNI-NDCSQNPCHNGGTCR-DLVNDFYCDCKNGWKGKTCHSRDSQC 709 

I :M II :: II Ml I : ||:|::| Ml: 
Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPYCECNSGFTGDSCD-REISC 167 



RESULT 14 

ID P78504 PRELIMINARY; PRT; 1227 AA. 

AC P78504; 

DT 01-MAY-1997 (TREMBLREL, 03, CREATED) 

DT 01-MAY-1997 (TREMBLREL. 03, LAST SEQUENCE UPDATE) 

DT 01-JAN-1999 (TREMBLREL, 09, LAST ANNOTATION UPDATE) 

DE JAGGED 1 (TRANSMEMBRANE PROTEIN JAGGED), 

GN DJ1. 

OS HOMO SAPIENS (HUMAN) , 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; MAMMALIA; EUTHERIA; PRIMATES; 

OC CATARRHINI; HOMINIDAE; HOMO. 

RN [1] 

RP SEQUENCE fROM N.A, 

RX MEDLINE; 95211842, 

RA LINDSELL C.E. , SHAWBER C.J., BOULTER J . , WEINMASTER G , ; 

RT "Jagged: a mammalian ligand that activates Notchl."; 

RL CELL 80:909-917(1995). 

RN [2] 

RP SEQUENCE FROM N.A, 

RA LI L. ( DENG Y,, BANTA A.B., HOOD L; 

RL SUBMITTED (DEC-1996) TO EMBL/GENBANK/DDBJ DATA BANKS. 

RN [3] 

RP SEQUENCE OF 14-1227 FROM N.A. 

RX MEDLINE; 97115768, 

RA ZIMRIN A.B., PEPPER M.S., MCMAHON G., NGUYEN P., MONTESANO R., 

RA MACIAG T.; 

RT "An antisense oligonucleotide to the notch ligand jagged enhances 

RT fibroblast growth factor- induced angiogenesis in vitro."; 

RL J. BIOL, CHEM. 271:32499-32502(1996). 

RN [4] 

RP REVISIONS TO 14-1227, 

RA. ZIMRIN A.B., NGUYEN P., MACIAG T,; 

RL SUBMITTED (MAY- 1997) TO EMBL/GENBANK/DDBJ DATA BANKS. 

DR EMBL; U73936; G1695274; -. 

DR EMBL; U77720; G2130537; -. 

DR PFAM; PF00008; EGF; 14. 

KW TRANSMEMBRANE. 



FT CONFLICT 1187 1227 

FT QRHADKTPKLDKQTGQQRLGKCPELKPNGVHRIADRGHCR 
FT R -> NGTPTKHPNWTNKQDNRDLESAQSLNRMEYIV 

FT (IN REF. 1 AND 2). 

SQ SEQUENCE ' 1227 AA; 134770 MW; 5D300B81 CRC32; 

Query Match 13.5%; Score 258; DB 4; Length 1227; 



Best Local Similarity 37,0%; Pred. No. 2.90e-40; 
Matches 44; Conservative 20; Mismatches 45; Indels 10; Gaps 9; 

Db 597 NVCGPHGKCKSQSGGKFTCDCNKGFTGTYCHENIND-CESNPCRNGGTCID-GVNSYKCI 654 
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HI: II I : I : lll:|: I I I : 1 1 I : I I : I 1 1 : ||| 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 655 CSDGWEGAYC - - ETNI ■ NDCSQNPCHNGGTCR- DLVNDFYCDCKNGWKGKTCHSRDSQC 709 

I :| I I I :: I I I :| II I : l|:|::| I :| |: I 
Qy 111 CLEGHGGVLCDEEEDLFNPCQMIKCKHG-KCRLSGVGQPYCECNSGFTGDSCD-REISC 167 



RESULT 15 

ID Q90819 PRELIMINARY; PRI; 1193 AA. 

AC Q90819; 

DT 01-NOV-1996 (TREMBLREL. 01, CREATED) 

DT 01-NOV-1996 (TREMBLREL, 01, LAST SEQUENCE UPDATE) 

DT 01-NOV-1998 (TREMBLREL. 08, LAST ANNOTATION UPDATE) 

DE C-SERATE-1 PROTEIN (FRAGMENT). 

OS GALLUS GALLUS (CHICKEN) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; ARCHOSAURIA; AVES; 

OC NEOGNATHAE; GALLIFORMES; PHASIANIDAE; PHASIANINAE; GALLUS. 

RN [1] 

RP SEQUENCE FROM N.A. 

« TISSUE-SPINAL CORD; 
MEDLINE; 96175595. 
MYAT A., HENRIQUE D., ISH-HOROWICZ D., LEWIS J.; 
"A chick horaologue of Serrate and its relationship with Notch and 

RT Delta homologues during central neurogenesis,"; 

RL DEV. BIOL. 174:233-247(1996). 

DR EMBL; X95283; E224084; -, 

DR PROSITE; PS00010; ASXJYDROXYL; 10, 

DR PROSITE; PS01186; EGF_2; 12. 

DR PROSITE; PS01187; EGF.CA; 8, 

DR PFAM; PF00008; EGF; 14, 

KW GLYCOPROTEIN; EGF-LIKE DOMAIN. 

FT NON.TER 1 1 

SQ SEQUENCE 1193 AA; 131039 MW; 55E5FCD1 CRC32; 

Query Match 13,4%; Score 256; DB 13; Length 1193; 

Best Local Similarity 37.0%; Pred. No, 9.42e-40; 

Matches 44; Conservative 21; Mismatches 44; Indels 10; Gaps 9; 

Db 571 PCGPHGKCKSQAGGKFTCECNKGFTGTYCHENIND-CESNPCKNGGTCID-GVNSYKCI 628 

:H: III::: Mill: II I : 1 1 I : M : I 1 1 : III 
Qy 53 KVCA-HGCCQPSSQSGFTCECEEGWMGPLCDQRTNDPCLGNKCVHG-TCLPINAFSYSCK 110 

Db 629 CSDGWEGTYC-ETNI-NDCSKNPCHNGGTCR-DLVNDFFCECKNGWKGKTCHSRDSQC 683 

M I I I =: I I : I :| II :lll::| I :| |: I 
Qy 111 CLEGHGGVLCDEEEDLFNPCQM I KCKHG - KCRLSGVGQP YCECNSGFTGDSCD - RE I SC 167 



Search completed: Fri May 28 09:42:55 1999 
K time : 53 sees. 
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^^rchjn n.a. • n.a. database search, using Smith-Waterman algorithm 
Run on : , Sat May 29 : 
Tabular output not generated 



13:20 1999; MasPar time 8535,17 Seconds 

1545.223 Million cell updates/sec 



Title: 

Description: 
Perfect Score: 
N.A. Sequence: 
Comp: 

Scoring table: 



Nmatch STD ; 
Searched: 



M1S-09-191-647-1 

(1-4758) from US09191647 . seq 

4758 

1 atgcgcggcgttggctggca aaaaaaaaaaaaaaactcga 4758 

tacgcgccgcaaccgaccgt tttttttttttttttgagct 

TABLE default 
Gap 6 

Dbase 0; Query 0 

646147 seqs, 1385953633 bases x 2 



Post-processing: Minimum Match 0% 

Listing first 45 summaries 



emb!58 



Database: 



l:em_bal 2:em_ba2 3: em Jun 4:em_htg 5:em_huml 6:em_hum2 
7:em.in 8:em_om 9:em_or 10:em_ov ll;em_pat 12:em_ph 
13 :em_pl 14:em_ro 15:em_sts 16:em_vi 
genbanklll 



• 17:gb_bal 18:gbJ»2 19:gb_htgl 20:gb_htg2 21:gb_inl 
22:gb_in2 23:gb_om 24 : gb_Ov 25:gb_pat 26:gb_ph 27 :gb_pll 
28:gb_pl2 29:gb_prl 30:gb_pr2 31:gb_pr3 32:gb_ro 
33 :gb_st 34:gb_sts 35:gb_sy 36:gb_un 37:gb_vi 

Statistics: Mean 12.589; Variance 6.522; scale 1.930 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



Result 



Query 



NO, 


Score 


Match Length 


DB 


ID 


Description 


Pred. No, 


1 


4588 


96,4 


5233 


31 


AF055585 


Homo sapiens neurogeni 


0.00e+00 


2 


4502 


94.6 


4950 


29 


AB017168 


Homo sapiens mRNA for 


0.00e+00 


3 


2498 


52.5 


5121 


32 


AF074960 


Mus musculus neurogeni 


0.00e+00 


4 


1249 


26.3 


5210 


32 


AB011531 


Rattus norvegicus mRNA 


0.00e+00 


5 


1231 


25.9 


5094 


29 


AB017167 


Homo sapiens mRNA for 


0.00e+00 


6 


1202 


25.3 


5015 


29 


AB017169 


Homo sapiens mRNA for 


0.00e+00 


7 


1177 


24.7 


4950 


32 


AB0U530 


Rattus norvegicus mRNA 


0.00e+00 


8 


505 


10,6 


2553 


31 


AF075240 


Homo sapiens SLIT1 pro 


0.Q0e+00 


9 


469 


9.9 


6921 


29 


AB011538 


Homo sapiens mRNA for 


0.00e+00 


10 


182 


3.8 


591 


32 


AB017170 


Rattus norvegicus mRNA 


1.15e-110 


11 


106 


2.2 


5401 


21 


DMSLIT 


Drosophila mRNA for si 


2.11e-52 


c 12 


80 


1.7 


7218 


25 


166494 


Sequence 14 from paten 


1.95e-33 


13 


68 


1.4 


297 


34 


G41330 


Z1154 Zebrafish AB Dan 


5.24e-25 





14 


67 


.4 721 32 


AM8RQ09 

nE \JQOy\Ji 


Mus musculus SLiTl pro 


2 56e~ 24 




15 


57 


9 791ft 9S 


ICO*-)*! 


Sequence 14 from paten 






16 


58 


9 OftfiQ^ 99 


nLUUJJJU 


Drosophila melanogaste 


323e 18 




17 


50 


.1 491 31 


HUMZE12G05 


Homo earn ana full lonrr 
nuiuu ou^lcMo IU11 icily 


5 38e - 13 




18 


50 


1 ftdfii' 19 




M. musculus notch'l mRN 






19 






nbuy /boy 


Homo sapiens Notch3 (N 


1.40e-15 




20 


46 


,0 7943 32 


MVNOTI" 1 


M, musculus mRNA for No 










.0 8221 32 


KKNUiln 


R. rattus mRNA homologu 


1.00e-ll 




22 


47 


n fism 99 


1A>U JO? / / 


Lucilia cuprina Notch 






23 


49 


.0 74371 31 


irnnsifiQ 

nUvvJJQj 


Homo sapiens chromosom 


M3e-12 




24 


42 


.9 965 25 


AR(19499Q 


Sequence 22 from paten 


5 00e _ 08 




25 


45 


q 11QK if) 

.3 1120 JU 


UQWTCHM 


Homo sapiens Notch3 (N 






26 


44 


.9 1674 21 


nBflcrjTPnP 
unvOiJiiDyr 


Drosophila melanogaste 


i 06e*09 




27 


41 


.9 7332 29 


HUMTANl 


Human TAN-1 mRNA (homo 


1 98e~ 07 




28 


41 


.9 7471 24 


BRNOTCH 


B . rerio Notch mRNA 


1 98e- 07 




29 


41 


.9 9166 24 


XELXOTCH 


AilacVlo AULLU ^lULclIl 


1.98e*07 




30 


44 


.9 30088 21 


CEC26G2 


Caenorhabditis elegans 


3 ,06e-09 


c 


31 


45 


.9 41150 30 


AC004663 


Homo sapiens chromosom 


7.44e-10 




32 


37 


.8 215 25 


128278 


Sequence 5 from patent 


4.28e-05 




33 


37 


.8 406 23 


AF020290 


Oryctolagus cuniculus 


4.28e-05 




34 


36 


.8 965 25 


AR024229 


Sequence 22 from paten 


l,58e-04 




35 


36 


.8 1056 23 


MVU87256 


Mustela vison GT dinuc 


1.58e-04 


c 


36 


40 


.8 1056 23 


MVU87256 


Mustela vison GT dinuc 


7.77e-07 




37 


38 


.8 2256 24 


DRN0TCH6 


D. rerio notch6 mRNA. 


l,14e-05 




38 


38 


.8 3314 29 


AB011537 


Homo sapiens mRNA for 


l,14e-05 




39 


39 


.8 8287 32 


RATNOTCHX 


Rat notch 2 mRNA. 


3 .00e-06 




40 


38 


.8 13692 21 


CELLIN12A 


C. elegans homeotic lin 


1.14e-05 


c 


41 


38 


.8 40970 21 


CER107 


Caenorhabditis elegans 


1.14e-05 




42 


35 


.7 2087 29 


HUMPGH3A 


Human pgH3 mRNA for pr 


5.71e-04 




43 


35 


.7 3162 30 


AF003522 


Homo sapiens Delta mRN 


5.71e-04 


c 


44 


35 


.7 10772 21 


AF012089 


Drosophila melanogaste 


5.71e-04 




45 


35 


.7 17137 21 


DRONOTCH03 


D.melanogaster Notch 1 


5.71e-04 



RESULT 1 
LOCUS 

DEFINITION 

ACCESSION 
NID 

VERSION 
KEYWORDS 
SOURCE 
ORGANISM 



AF055585 5233 bp mRNA PRI 04-MAR-1999 

Homo sapiens neurogenic extracellular slit protein Slit2 mRNA, 
complete cds. 
AF055585 
g4151204 

AF055585.1 61:4151204 
human, 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Primates; Catarrhini; Hominidae; Homo, 
REFERENCE 1 (bases 1 to 5233) 
AUTHORS Holmes, G. P., Negus, K,, Burridge,L., Raman, S., Algar,E., Yamada,T. 
and Little,M.H. 

TITLE Distinct but overlapping expression patterns of two vertebrate slit 
homologs implies functional roles in CNS development and 
organogenesis 

JOURNAL Mech. Dev. 79 (1-2), 57-72 (1998) 
REFERENCE 2 (bases 1 to 5233) 

AUTHORS Holmes, G. P., Burridge,L., Negus, K., Raman, S. and Little, M. 

TITLE Direct Submission 

JOURNAL Submitted (25-MAR-1998) Center for Molecular and Cellular Biology, 
University of Queensland, St. Lucia, Brisbane, QLD 4072, Australia 
FEATURES Location/Qualif iers 

source 1. .5233 

/organism= "Homo sapiens" 

/db_xref""taxon:9606" 

/chromosome- "4" 

/map="4pl6.3" 

/dev_stage-" fetal" 

/tissue_type°"kidney; brain" 
CDS 557. .5122 

/note-'leucine-rich repeat EGF-like repeat secreted 

protein; similar to Drosophila neurogenic extracellular 

slit protein; expressed in developing CNS, limbs and 

metanephros" 
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/codon_start=l 

/product* "neurogenic extracellular slit protein Slit2" 

/proteinJd="AAD04309.r 

/db_xref="PID:g4151205" 

/db_xref- n GI:4151205" 

/translation="MRGVGWQMLSLSLGLVLAILNKVAPQACPAQCSCSGSTVDCHGL 
ALRSVPRNIPRNTERLDLNGNNITRITKTDFAGLRHLRVLQLMENKISTIERGAFQDL 
KELERLRLNRNHLQLFPELLFLGTAKLYRLDLSENQIQAIPRKAFRGAVDIKNLQLDY 
NQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKLRTFRLHSNNLYCDCHLA 
WLSDWLRQRPRVGLYTQCMGPSHLRGHNVAEVQKREFVCSGHQSFMAPSCSVLHCPAA 
CTCSNNIVDCRGKGLTEIPTNLPETITEIRLEQNTIKVIPPGAFSPYKKLRRIDLSNN 
QISELAPDAFQGLRSLNSLVLYGKKITELPKSLFEGLFSLQLLLLNANKINCLRVDAF 
QDLHNLNLLSLYDNKLQTIAKGTFSPLRAIQTMHLAQNPFICDCHLKWLADYLHTNPI 
ETSGARCTSPRRLANKRIGQIKSKKFRCSGIEDYRSKLSGDCFADLACPEKCRCEGTT 
VDCSNQKLNKIPEHIPQYTAELRLNNNEFTVLEATGIFKKLPQLRKINFSNNKITDIE 
EGAFEGASGVNEILLTSNRLENVQHKMFKGLEKPQNLMLRSNRITCVGNDSFIGLSSV 
RMLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLAWLGEWLRKKRIVTGNPR 
CQKPYFLKEIPIQDVAIQDFTCDDGNDDNSCSPLSRCPTECTCLDTVVRCSNKGLKVL 
PKGIPRDVTELYLDGNQFTLVPKELSNYKHLTLIDLSNNRISTLSNQSFSNMTQLLTL 
ILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISWPEGAFNDLSALSHLAIGANPLYCD 
CNMQWLSDWVKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDVNILAKCNPC 
f LSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTCHLKEGEED 
GFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYIGELCEEKLD 
FCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNGAHCTDAVN 
GYTCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQCLPGYQGE 
KCEKLVSVNFINKESYLQIPSAKVRPQTNITLQIATDEDSGILLYKGDKDHIAVELYR 
GRVRASYDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPKIITNLSKO 
STLNFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDFQKVPMQTG 
ILPGCEPCHKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGNKCVHG1CL 
PINAFSYSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPYCECSSGYTG 
DSCDREISCRGERIRDYYQKQQGYAACQTTKKVSRLECRGGCAGGQCCGPLRSKRRKY 
SFECTDGSSFVDEVEKWKCGCTRCVS " 

miscjeature 638. .730 

/note- "Reg ion: amino-ilanking" 

miscjeature 734. .766 

/note- "Region: leucine-rich region 1-1" 

miscjeature 779. .838 

/note- "Region: leucine-rich region 1-2" 

miscjeature 851. .907 

/note- "Region: leucine-rich region 1-3" 

miscjeature 923. .982 

/note-'Region: leucine-rich region 1-4" 

miscjeature 995. .1054 

/note="Region: leucine-rich region 1-5" 

miscjeature 1067. .1126 

/note-"Region; leucine-rich region 1-6" 

miscjeature 1139, .1174 

/note- "Region; leucine-rich region 1-7" 

miscjeature 1181, ,1372 

/note-"Region: carboxy-f lanking" 

miscjeature 1373. .1465 

/note- "Region: amino- flanking" 

miscjeature 1469. .1501 

/note- "Region: leucine-rich region 2-1" 

miscjeature 1514. ,1573 

/note- "Region: leucine-rich region 2-2" 

miscjeature 1586, ,1645 

/note- "Region: leucine-rich region 2-3" 

miscjeature 1658. .1717 

/note- "Region: leucine-rich region 2-4" 

miscjeature 1730. .1789 

/note- "Region: leucine-rich region 2-5" 

miscjeature 1802. ,1837 

/note- "Region: leucine-rich region 2-6" 

miscjeature 1844. .2047 

/note-"Region: carboxy-flanking" 

miscjeature 2048. .2140 

/note- "Region: amino- flanking" 

miscjeature 2144, ,2176 

/note- "Region: leucine-rich region 3-1" 

miscjeature 2192. ,2251 

/note-"Region: leucine-rich region 3-1A" 

miscjeature 2264. ,2323 





/note-"Region: 


leucine-rich region 3-2" 


mlflf feature 


2336. 


.2395 




/note- 


"Region; 


leucine-rich region 3-3 


misc feature 


2408, 


,2467 




{inn 6 " 


"Region; 


leucine-rich region 3-4" 


roisc feature 


2480. 


,2515 






Region; 


l < < u -> en 

leucine-ncn region j-5 


misc feature 


2522* e " 


,2710 




/note- 


"Region: 


tQiLiUAy iiaiiKiuy 


misc feature 


2711. 


.2803 






"Region; 


amino -flanking" 


misc feature 


2807, 


,2839 






/note- 


"Region: 


leucine-rich region 4-1" 


misc feature 


2849, 


,2908 




/note- 


"Region; 


leucine-rich region 4-2 


nH ch feature 


2921. 


.2980 






/note- 


"Region: 


leucine - rich region 4-3" 


misc_ reature 


2993, 


,3052 




/note- 


"Region: 


leucine-rich region 4-4 


miscjeature 


3065. 


,3100 




/note- 


"Region: 


leucine-rich region 4-5" 


m\ en feature 


3107. 


.3280 




/note- 


"Region: 


carooxy- flanking 


miscjeature 


3284. 


.3397 






/note- 


"Region: 


EGF'like repeat 1" 


misc_ieature 


3401. 


.3520 




/note- 


"Region: 


EGF-like repeat 2 


ml Of feature 


3524. 


3634 






/note- 


"Region: 


EGF-like repeat 3" 


t 

miscjeature 


3638. 


3754 




/note- 


"Region; 


EGF-like repeat 4" 


miscjeature 


3758. 


3868 








"Region: 


EGF-like repeat 5" 


miscjeature 


3893. 


4003 




/note- 


"Region: 


EGF-like repeat 6" 


miscjeature 


4520. 


,4636 






/note- 


"Region: 


EGF-like repeat 6A" 


miscjeature 


4643. 


4753 




/note- 


"Region: 


EGF-like repeat 7" 


miscjeature 


4766. 


.4876 




/note- 


'Region: 


EGF-like repeat 8" 


miscjeature 


4976. 


.5119 






/note- 


"Region: 


CTC knot" 



BASE COUNT 1393 a 1253 c 1264 g 1322 t 1 others 
ORIGIN 

Query Match 96.4%; Score 4588; DB 31; Length 5233; 

Best Local Similarity 99.7%; Pred, No, 0.00e+00; 

Matches 4674; Conservative 0; Mismatches 2; Indels 14; Gaps 3; 

Db 557 ATGCGCGGCGTTGGCTGGCAGATGCTGTCCCTGTCGCTGGGGTTAGTGCTGGCGATCCTG 616 
lllllllllllllllllllllllllllllllllllllllllllllllllllllll 

Qy 1 atgcgcggcgttggctggcagatgctgtccctgtcgctggggttagtgctggcgatcctg 60 

Db 617 AACAAGGTGGCACCGCAGGCGTGCCCGGCGCAGTGCTCTTGCTCGGGCAGCACAGTGGAC 676 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Qy 61 aacaaggtggcaccgcaggcgtgcccggcgcagtgctcttgctcgggcagcacagtggac 120 

Db 677 TGTCACGGGCTGGCGCTGCGCAGCGTGCCCAGGAATATCCCCCGCAACACCGAGAGACTG 736 

IIIIIIIIIIIIIIIIIIIIIIIMIIIIMIIMIIIIIIIIIIIIIIIIIMIIIIII 
Qy 121 tgtcacgggctggcgctgcgcagcgtgcccaggaatatcccccgcaacaccgagagactg 180 

Db 737 GATTTAAATGGAAATAACATCACAAGAATTACGAAGACAGATTTTGCTGGTCTTAGACAT 796 

IIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
Qy 181 gatttaaatggaaataacatcacaagaattacgaagacagattttgctggtcttagacat 240 

Db 797 CTAAGAGTTCTTCAGCTTATGGAGAATAAGATTAGCACCATTGAAAGAGGAGCATTCCAG 856 

IIIIIIIIIIIMMMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 
Qy 241 ctaagagttcttcagcttatggagaataagattagcaccattgaaagaggagcattccag 300 

Db 857 GAICTTAAAGAACTAGAGAGACTGCGTTTAAACAGAAATCACCTTCAGCTGTTTCCTGAG 916 

IIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIillllllllllllllllMIIIIIIIII 
Qy 301 gatcttaaagaactagagagactgcgtttaaacagaaatcaccttcagctgtttcctgag 360 
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Db 


917 


TTGCTGTTTCTTGGGACTGCGAAGCTATACAGGCTTGATCTCAGTGAAAACCAAATTCAG 9 7 6 


Qy 


361 


ttgctgtttcttgggactgcgaagctatacaggcttgatctcagtgaaaaccaaattcag 420 


Db 


977 


GCAATCCCAAGGAAAGCTTTCCGTGGGGCAGTTGACATAAAAAATTTGCAACTGGATTAC 


1036 


Qy 


421 


gcaatcccaaggaaagctttccgtggggcagttgacataaaaaatttgcaactggattac 


480 


Db 


1037 


AACCAGATCAGCTGTATTGAAGATGGGGCATTCAGGGCTCTCCGGGACCTGGAAGTGCTC 


1096 


Qy 


481 


aaccagatcagctgtattgaagatggggcattcagggctctccgggacctggaagtgctc 


540 


Db 


1097 


ACTCTCAACAATAACAACATTACTAGACTTTCTGTGGCAAGTTTCAACCATATGCCTAAA 


1156 


Qy 


541 


actctcaacaataacaacattactagactttctgtggcaagtttcaaccatatgcctaaa 


600 


| 


1157 


CTTAGGACTTTTCGACTGCATTCAMCAACCTGTATTGTGACTGCCACCIGGCCTGGCTC 


1216 


f 


601 


cttaggacttttcgactgcattcaaacaacctgtattgtgactgccacctggcctggctc 


660 


Db 


1217 


TCCGACTGGCTTCGCCAAAGGCCTCGGGTTGGTCTGTACACTCAGTGTATGGGCCCCTCC 


1276 


Qy 


661 


IIIIIIIIIIIIIH IMIMIII1IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIII 
tccgactggcttcgcaaaaggcctcgggttggtctgtacactcagtgtatgggcccctcc 


720 


Db 


1277 


CACCTGAGAGGCCATAATGTAGCCGAGGTTCAAAAACGAGAATTTGICTGCAGTG 


1331 


Qy 


721 


cacctgagaggccataatgtagccgaggttcaaaaacgagaatttgtctgcagtgatgag 


780 


Db 


1332 


GTCACCAGTCATTTATGGCTCCTTCTTGTAGTGTTTTGCACTGCCCTGCCGCC 


1384 


Qy 


781 


gaagaaggtcaccagtcatttatggctccttcttgtagtgttttgcactgccctgccgcc 


840 


Db 


1385 


TGTACCTGTAGCAACAATATCGTAGACTGTCGTGGGAAAGGTCTCACTGAGATCCCCACA 


1444 


Qy 


841 


tgtacctgtagcaacaatatcgtagactgtcgtgggaaaggtctcactgagatccccaca 


900 


Db 


1445 


AATCTTCCAGAGACCATCACAGAAATACGTTTGGAACAGAACACAATCAAAGTCATCCCT 


1504 


Qy 


901 


aatcttccagagaccatcacagaaatacgtttggaacagaacacaatcaaagtcatccct 


960 


Db 


1505 


CCTGGAGCTTTCTCACCATATAAAAAGCTTAGACGMTTGACCTGAGCAATAATCAGATC 


1564 


Qy 


961 


cctggagctttctcaccatataaaaagcttagacgaattgacctgagcaataatcagatc 


1020 


| 


1565 


TCTGAACTTGCACCAGATGCTTTCCMGGACTACGCTCTCTGAATTCACTTGTCCTCTAT 


1624 


f 


1021 


tctgaacttgcaccagatgctttccaaggactacgctctctgaattcacttgtcctctat 


1080 


Db 


1625 


GGAAATAAMTCACAGAACTCCCCAAAAGTTTATTTGAAGGACTGTTTTCCTTACAGCTC 


1684 


Qy 


1081 


ggaaataaaatcacagaactccccaaaagtttatttgaaggactgttttccttacagctc 


1140 


Db 


1685 


CTATTATTGAATGCCAACAAGATAAACTGCCTTCGGGTAGATGCTTTTCAGGATCTCCAC 


1744 


Qy 


1141 


ctattattgaatgccaacaagataaactgccttcgggtagatgcttttcaggatctccac 


1200 


Db 


1745 


AACTTGAACCTTCTCTCCCTATATGACAACAAGCTTCAGACCATCGCCAAGGGGACCTTT 


1804 


Qy 


1201 


aacttgaaccttctctccctatatgacaacaagcttcagaccatcgccaaggggaccttt 


1260 


Db 


1805 


TCACCTCTTCGGGCCATTCAAACTATGCATTTGGCCCAGAACCCCTTTATTTGTGACTGC 


1864 


Qy 


1261 


tcacctcttcgggccattcaaactatgcatttggcccagaacccctttatttgtgactgc 


1320 


Db 


1865 


CATCTCAAGTGGCTAGCGGATTATCTCCATACCAACCCGATTGAGACCAGTGGTGCCCGT 


1924 


Qy 


1321 


catctcaagtggctagcggattatctccataccaacccgattgagaccagtggtgcccgt 


1380 


Db 


1925 


TGCACCAGCCCCCGCCGCCTGGCAMCAAAAGAATTGGACAGATCAAAAGCAAGAAATTC 


1984 


Qy 


1381 


tgcaccagcccccgccgcctggcaaacaaaagaattggacagatcaaaagcaagaaattc 


1440 



Db 


1985 CGTTGTTCAGGTACAGAAGATTATCGATCAAAATTAAGTGGAGACTGCTTTGCGGATCTG 2044 


Qy 


1441 cgttgttcaggtacagaagattatcgatcaaaattaagtggagactgctttgcggatctg 1500 


Db 


2045 GCTTGCCCTGAAAAGTGTCGCTGTGAAGGMCCACAGTAGATTGCTCTAATCAAAAGCTC 2104 


Qy 


1501 gcttgccctgaaaagtgtcgctgtgaaggaaccacagtagattgctctaatcaaaagctc 1560 


Db 


2105 AACAAAATCCCGGAGCACATTCCCCAGTACACTGCAGAGITGCGTCTCAATAAIAATGAA 2164 


Qy 


1561 aacaaaatcccggagcacattccccagtacactgcagagttgcgtctcaataataatgaa 


1620 


Db 


2165 TTTACCGTGTTGGAAGCCACAGGAATCTTTAAGAAACTTCCTCAATTACGTAAAATAMC 


2224 


Qy 


1621 tttaccgtgttggaagccacaggaatctttaagaaacttcctcaattacgtaaaataaac 


1680 


Db 


2225 TTTAGCAACAATAAGATCACAGATATTGAGGAGGGAGCATTTGAAGGAGCATCIGGTGTA 


2284 


Qy 


1681 tttagcaacaataagatcacagatattgaggagggagcatttgaaggagcatctggtgta 


1740 


Db 


2285 AATGAAATACTTCTTACGAGTAATCGTTTGGAAMTGTGCAGCATAAGATGTTCAAGGGA 


2344 


Qy 


1741 aatgaaatacttcttacgagtaatcgtttggaaaatgtgcagcataagatgttcaaggga 


1800 


Db 


2345 TTGGAAAAGCCTCAAAACTT - GATGTTGAGAAGCAATCGAATAACCTGTGTGGGGAATG A 


2403 


Qy 


1801 ttggaaa-gcctcaaaactttgatgttgagaagcaatcgaataacctgtgtggggaatga 


1859 


Db 


2404 CAGTTTCATAGGACTCAGTTCTGTGCGTATGCTTTCTTTGTATGATAATCAAATTACTAC 


2463 


Qy 


1860 cagtttcataggactcagttctgtgcgtttgctttctttgtatgataatcaaattactac 


1919 


Db 


2464 AGTTGCACCAGGGGCATTTGATACTCTCCATTCTTTATCTACTCTAAACCTCTTGGCCAA 


2523 


Qy 


1920 agttgcaccaggggcatttgatactctccattctttatctactctaaacctcttggccaa 


1979 


Db 


2524 TCCTTTTMC1GIAACTGCTACCTGGCTTGGTTGGGAGAGTGGCTGAGAMGAAGAGAAT 


2583 


Qy 


1980 tccttttaactgtaactgctacctggcttggttgggagagtggctgagaaagaagagaat 


2039 


Db 


2584 TGTCACGGGAAATCCTAGATGTCAAAMCCATACTTCCTGAMGAMTACCCATCCAGGA 


2643 


Qy 


2040 tgtcacgggaaatcctagatgtcaaaaaccatacttcctgaaagaaatacccatccagga 


2099 


Db 


2644 TGTGGCCATTCAGGACTTCACTTGTGATGACGGAAATGATGACAATAGTTGCTCCCCACT 


2703 


Qy 


2100 tgtggccattcaggacttcacttgtgatgacggaaatgatgacaatagttgctccccact 


2159 


Db 


2704 TTCTCGCTGICCTACTGAATGTACTTGCITGGATACAGTCG1CCGATGTAGCAACMGGG 


2763 


Qy 


2160 ttctcgctgtcctactgaatgtacttgcttggatacagtcgtccgatgtagcaacaaggg 


2219 


Db 


2764 TTTGAAGGTCTTGCCGAAAGGTATTCCAAGAGATGTCACAGAGTTGTATCTGGATGGAAA 


2823 


Qy 


2220 tttgaaggtcttgccgaaaggtattccaagagatgtcacagagttgtatctggatggaaa 


2279 


Db 


2824 CCAATTTACACTGGTTCCCAAGGAACTCTCCAACTACAMCATTTMCACTTATAGACTT 


2883 


Qy 


2280 ccaatttacactggttcccaaggaactctccaactacaaacatttaacacttatagactt 


2339 


Db 


2884 AAGIAACAACAGAATAAGCACGCTTTCTAATCAGAGCTTCAGCAACAIGACCCAGCTCCT 


2943 


Qy 


2340 aagtaacaacagaataagcacgctttctaatcagagcttcagcaacatgacccagctcct 


2399 


Db 


2944 CACCTTAATTCTTAGTTACAACCGTCTGAGATGTATTCCTCCTCGCACCTTTGATGGATT 


3003 


Qy 


2400 caccttaattcttagttacaaccgtctgagatgtattcctcctcgcacctttgatggatt 


2459 


Db 


3004 AMGTCTCTTCGATTACTTTCTCTACATGGAAATGACATTTCTGTTGTGCCTGAAGGTGC 


3063 


Qy 


2460 aaagtctcttcgattactttctctacatggaaatgacatttctgttgtgcctgaaggtgc 


2519 


Db 


3064 TTTCMTGATCTTTCIGCATTATCACATCTAGCAATTGGAGCCAACCCTCTTTACTGTGA 


3123 
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Qy 2520 

Db 3124 

Qy 2580 

Db 3184 

Qy 2640 

Db 3244 

Qy 2700 

Db 3304 

Qy 2760 

Db 3364 

f 2820 
3424 

Qy 2880 

Db 3484 

Qy 2940 

Db 3544 

Qy 3000 

Db 3604 

Qy 3060 

Db 3664 

Qy 3120 

Db 3724 

Qy 3180 

Db 3784 

• 3240 
3844 

Qy 3300 

Db 3904 

Qy 3360 

Db 3964 

Qy 3420 

Db 4024 

Qy 3480 

Db 4084 

Qy 3540 

Db 4144 



tttcaatgatctttctgcattatcacatctagcaattggagccaaccctctttactgtga 2579 

TTGTMCATGCAGTGGTTATCCGACTGGGTGMGTCGGAATOAAGGAGCCTGGAATTGC 3 183 

ttgtaacatgcagtggttatccgactgggtgaagtcggaatataaggagcctggaattgc 2639 

TCGTTGTGCTGGTCCTGGAGAAATGGCAGATAAACTTTTACTCACAACTCCCTCCAAAAA 3243 

tcgttgtgctggtcctggagaaatggcagataaacttttactcacaactccctccaaaaa 2699 

ATTTACCTGTCAAGGTCCTGTGGATGTCAATATTCTAGCTAAGTGTAACCCCTGCCTATC 3303 

atttacctgtcaaggtcctgtggatgtcaatattctagctaagtgtaacccctgcctatc 2759 

AMTCCGTGTAAAMTGATGGCACATGTAATAGTGATCCAGTTGACTTTTACCGATGCAC 3363 

aaatccgtgtaaaaatgatggcacatgtaatagtgatccagttgacttttaccgatgcac 2819 

CTGTCCATATGGTTTCAAGGGGCAGGACTGTGATGTCCCAATTCATGCCTGCATCAGTAA 3423 

ctgtccatatggtttcaaggggcaggactgtgatgtcccaattcatgcctgcatcagtaa 2879 

CCCATGTAMCATGGAGGMCTTGCCACTTAMGGAAGGAGMGAAGATGGATTCTGGTG 3483 

cccatgtaaacatggaggaacttgccacttaaaggaaggagaagaagatggattctggtg 2939 

TATTTGTGCTGATGGATTTGAAGGAGAAAATTGTGAAGTCAACGTTGATGATTGTGAAGA 3543 

tatttgtgctgatggatttgaaggagaaaattgtgaagtcaacgttgatgattgtgaaga 2999 

TAATGACTGTGAAAATAATTCTACATGTGTCGATGGCATTAATAACTACACATGCCTTTG 3603 

taatgactgtgaaaataattctacatgtgtcgatggcattaataactacacatgcctttg 3059 

CCCACCTGAGTATACAGGTGAGTTGTGTGAGGAGAAGCTGGACTTCTGTGCCCAGGACCT 3663 

cccacctgagtatacaggtgagttgtgtgaggagaagctggacttctgtgcccaggacct 3119 

GAACCCCTGCCAGCACGATTCAAAGTGCATCCTAACTCCAAAGGGATTCAAATGTGACTG 3723 

gaacccctgccagcacgattcaaagtgcatcctaactccaaagggattcaaatgtgactg 3179 

CACACCAGGGTACGTAGGTGAACACTGCGACATCGATTTTGACGACTGCCAAGACAACAA 3783 

cacaccagggtacgtaggtgaacactgcgacatcgattttgacgactgccaagacaacaa 3239 

GTGTAAAAACGGAGCCCACTGCACAGATGCAGTGAACGGCTATACGTGCATATGCCCCGA 3843 

gtgtaaaaacggagcccactgcacagatgcagtgaacggctatacgtgcatatgccccga 3299 

AGGTTACAGTGGCTTGTTCTGTGAGTTTTCTCCACCCATGGTCCTCCCTCGTACCAGCCC 3903 

aggttacagtggcttgttctgtgagttttctccacccatggtcctccctcgtaccagccc 3359 

CTGTGATAATTTTGATTGTCAGAATGGAGCTCAGTGTATCGTCAGAATAAATGAGCCAAT 3963 

ctgtgataattttgattgtcagaatggagctcagtgtatcgtcagaataaatgagccaat 3419 

ATGTCAGTGTTTGCCTGGCTATCAGGGAGAAAAGTGTGAAAAATTGGTTAGTGTGAATTT 4023 

atgtcagtgtttgcctggctatcagggagaaaagtgtgaaaaattggttagtgtgaattt 3479 

TATAAACAAAGAGTCTTATCTTCAGATTCCTTCAGCCAAGGTTCGGCCTCAGACGAACAT 4083 

tataaacaaagagtcttatcttcagattccttcagccaaggttcggcctcagacgaacat 3539 

AACACTTCAGATTGCCACAGATGAAGACAGCGGAATCCICCTGTATAAGGGTGACAAAGA 4143 

aacacttcagattgccacagatgaagacagcggaatcctcctgtataagggtgacaaaga 3599 

CCATATCGCGGTAGAACTCTATCGGGGGCGTGTTCGTGCCAGCTATGACACCGGCTCTCA 4203 



3600 ccatatcgcggtagaactctatcgggggcgtgttcgtgccagctatgacaccggctctca 3659 
,204 TCCAGCTTCTGCCATTTACAGTGTGGAGACAATCAATGATGGAAACTTCCACATTGTGGA 4263 
3660 tccagcttctgccatttacagtgtggagacaatcaatgatggaaacttccacattgtgga 3719 
1264 ACTACTTGCCTTGGATCAGAGTCTCTCTTTGTCCGIGGATGGTGGGAACCCCAAAATCAT 4323 
3720 actacttgccttggatcagagtctctctttgtccgtggatggtgggaaccccaaaatcat 3779 
,324 CACTAACTTGTCAAAGCAGTCCACTCTGAATTTTGACTCTCCACTCTATGTAGGAGGCAT 4383 
3780 cactaacttgtcaaagcagtccactctgaattttgactctccactctatgtaggaggcat 3839 
,384 GCCAGGGAAGAGTAACGTGGC ATCTCTGCGCCAGGCCCCTGGGCAGAACGGAACC AGCTT 4443 
gccagggaagagtaacgtggcatctctgcgccaggcccctgggcagaacggaaccagctt 3899 
1444 CCACGGCTGCATCCGGAACCTTTACATCAACAGTGAGCTGCAGGACTTCCAGAAGGTGCC 4503 
J900 ccacggctgcatccggaacctttacatcaacagtgagctgcaggacttccagaaggtgcc 3959 
1504 GATGCAAACAGGCATTTTGCCTGGCTGTGAGCCATGCCACAAGAAGGTGTGTGCCCATGG 4563 



II 

i960 gatgcaaacaggcattttgcctggctgtgagccatgccacaagaaggtgtgtgcccatgg 4019 

1564 CACATGCCAGCCCAGCAGCCAGGCAGGCTTCACCTGCGAGTGCCAGG AAGGATGGATGGG 4623 

1020 cacatgccagcccagcagccaggcaggcttcacctgcgagtgccaggaaggatggatggg 4079 

1624 GCCCCTCTGTGACCAACGGACCAATGACCCTTGCCTTGGAAATAAATGCGTACATGGCAC 4683 

gcccctctgtgaccaacggaccaatgacccttgccttggaaataaatgcgtacatggcac 4139 

1684 CTGCTTGCCCATCAATGCGTICTCCTACAGCTGTAAGTGCTTGGAGGGCCATGGAGGTGT 4743 

1140 ctgcttgcccatcaatgcgttctcctacagctgtaagtgcttggagggccatggaggtgt 4199 

1744 CCTCTGTGATGAAGAGGAGGATCTGTTTAACCCATGCCAGGCGATCAAGTGCAAGCATGG 4803 

1200 cctctgtgatgaagaggaggatctgtttaacccatgccaggcgatcaagtgcaagcatgg 4259 

1804 GAAGTGCAGGCTTTCAGGTCTGGGGCAGCCCTACTGTGAATGCAGCAGTGGATACACGGG 4863 

1260 gaagtgcaggctttcaggtctggggcagccctactgtgaatgcagcagtggatacacggg 4319 

1864 GGACAGCTGTGATCGAGAAATCTCTTGTCGAGGGGAMGGATAAGAGATTATTACCAAAA 4923 

1320 ggacagctgtgatcgagaaatctcttgtcgaggggaaaggataagagattattaccaaaa 4379 

1924 GCAGCAGGGCTATGCTGCTTGCCAAACMCCAAGAAGGTGTCCCGATTAGAGTGCAGAGG 4983 

1380 gcagcagggctatgctgcttgccaaacaaccaagaaggtgtcccgattagagtgcagagg 4439 

TGGGTGTGCAGGAGGGCAGTGCTGTGGACCGCIGAGGAGCAAGCGGCGGAAATACTCTTT 504 3 

4440 tgggtgtgcaggagggcagtgctgtggaccgctgaggagcaagcggcggaaatactcttt 4499 

5044 CGAATGCACTGACGGCTCCTCCTTTGTGGACGAGGTTGAGAAAGTGGTGAAGTGCGGCTG 5103 

1500 cgaatgcactgacggctcctcctttgtggacgaggttgagaaagtggtgaagtgcggctg 4559 

5104 TACGAGGTGIGTGTCCTAAACACACTCCCGGCAGCTCTGTCTTTGGAAAAGGTTGTATAC 5163 

1560 tacgaggtgtgtgtcctaaacacactcccggcagctctgtctttggaaaaggttgtatac 4619 

5164 TTCTTGACCATGTGGGACTAAIGAATGCTTCATAGIGGAAATATTTGAAATATATTGTAA 5223 

1620 ttcttgaccatgtgggactaatgaatgcttcatagtggaaatatttgaaatatattgtaa 4679 

5224 AATACAGAAC 5233 
aatacagaac 4689 
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RESULT 2 
LOCUS 

DEFINITION 
ACCESSION 
NID 

VERSION 
KEYWORDS 
SOURCE 
ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
FEATURES 

source 



gene 
CDS 



BASE COUNT 
ORIGIN 



AB017168 4950 bp mRNA PRI 06-FEB-1999 

Homo sapiens mRNA for siit-2 protein, complete cds. 

AB017168 
g4049586 

AB017168.1 GI:4049S86 
slit-2; Slit-2 protein. 

Homo sapiens fetal tissue_lib:lung cDNA to mRNA. 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 4950) 
ltoh,A. and Sakano,S. 
Direct Submission 

Submitted (27-AUG-1998) to the DDBJ/EMBL/GenBank databases. Akira 
Itoh, Asahi Chemical industry co.,ltd., Life Science Fundamental 
Research Laboratory; 2-1, Samejima, Fuji, Shizuoka 416-8501, Japan 
(E-mail :a8611483 «ut . asahi-kasei.co.jp, Tel : +81-545-62-3231, 
Fax:+81-545-62-3249) 

2 (sites) 

Itoh, A., Miyabayashi,T., Ohno,M. and Sakano,S. 

Cloning and expressions of three mammalian homologues of Drosophila 

slit suggest possible roles for Slit in the formation and 

maintenance of the nervous system 

Brain Res. Mol. Brain Res. 62 (2), 175-186 (1998) 

99033071 

Location/Qualifiers 
1. .4950 

/organism= "Homo sapiens" 

/db_xref°"taxon:9606" 

/dev.stage"" fetal" 

/tissue_lib-"lung" 

205. .4794 

/gene» n slit-2" 

205. .4794 

/gene-"slit-2" 

/codon.starW 

/product-"Slit-2 protein" 

/protein_id-"BAA35185.1" 

/db_xref="PID:dl036171" 

/db_xref= n PID:g4049587" 

/db_xref="GI: 4049587" 

/translation" "MRGVGWQMLSLSLGLVLAILNKVAPQACPAQCSCSGSTVDCHGL 
ALRSVPRNIPRNTERLDLNGNNITRITKTDFAGLRHLRVLQLMENKISTIERGAFQDL 
■ KELERLRLNRNHLQLFPELLFLGTAKLYRLDLSENQIQAIPRKAFRGAVDIKNLQLDY 
NQISCIEDGAFRALRDLEVLTLNNNNITRLSVASFNHMPKLRTFRLHSNNLYCDCHLA 
WLSDWLRQRPRVGLYTQCMGPSHLRGHNVAEVQKREFVCSGHQSFMAPSCSVLHCPAA 
CTCSNNIVDCRGKGLTEIPTNLPETITEIRLEQNTIKVIPPGAFSPYKKLRRIDLSNN 
QISELAPDAFQGLRSLNSLVLYGNKITELPKSLFEGLFSLQLLLLNANKINCLRVDAF 
QDLHNLNLLSLYDNKLQTIAKGTFSPLRAIQTMHLAQNPFICDCHLKWLADYLHTNPI 
ETSGARCTSPRRLANKRIGQIKSKKFRCSAKEQYFIPGTEDYRSKLSGDCFADLACPE 
KCRCEGTTVDCSNQKLNKIPEHIPQYTAELRLNNNEFTVLEATGIFRKLPQLRKINFS 
NNKITDIEEGAFEGASGVNEILLTSNRLENVQHKMFKGLESLKTLMLRSNRITCVGND 
SFIGLSSVRLLSLYDNQITTVAPGAFDTLHSLSTLNLLANPFNCNCYLAWLGEWLRKK 
RIVTGNPRCQKPYFLKEIPIQDVAIQDFTCDDGNDDNSCSPLSRCPTECTCLDTWRC 
SNKGLKVLPKGIPRDVTELYLDGNQFTLVPKELSNYKHLTLIDLSNNRISTLSNQSFS 
NMTQLLTLILSYNRLRCIPPRTFDGLKSLRLLSLHGNDISWPEGAFNDLSALSHLAI 
GANPLYCDCNMQWLSDWVKSEYKEPGIARCAGPGEMADKLLLTTPSKKFTCQGPVDVN 
ILAKCNPCLSNPCKNDGTCNSDPVDFYRCTCPYGFKGQDCDVPIHACISNPCKHGGTC 
HLKEGEEDGFWCICADGFEGENCEVNVDDCEDNDCENNSTCVDGINNYTCLCPPEYTG 
ELCBEKLDFCAQDLNPCQHDSKCILTPKGFKCDCTPGYVGEHCDIDFDDCQDNKCKNG 
AHCTDAVNGYTCICPEGYSGLFCEFSPPMVLPRTSPCDNFDCQNGAQCIVRINEPICQ 
C LPG YQG EKCEKLVSVNF INKES Y LQI PSAKVRPQTNI TLQ I AT DEDSGILLYKGDKD 
HIAVELYRGRVRASYDTGSHPASAIYSVETINDGNFHIVELLALDQSLSLSVDGGNPK 
IITNLSKQSTLNFDSPLYVGGMPGKSNVASLRQAPGQNGTSFHGCIRNLYINSELQDF 
QKVPMQTGILPGCEPCHKKVCAHGTCQPSSQAGFTCECQEGWMGPLCDQRTNDPCLGN 
KCVHGTCLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQAIKCKHGKCRLSGLGQPYC 
ECSSGYTGDSCDREISCRGERIRDYYQKQQGYAACQTTKKVSRLECRGGCAGGQCCGP 
LRSKRRKYSFECTDGSSFVDEVEKWKCGCTRCVS" 
1355 a 1138 c 1186 g 1271 t 



Query Match 94.6%; Score 4502; DB 29; Length 4950; 

Best Local Similarity 99.2%; Pred. No. 0.00e+00; 

Matches 4720; Conservative 0; Mismatches 2; Indels 36; Gaps 2; 

Db 205 ATGCGCGGCGTTGGCTGGCAGATGCTGTCCCTGTCGCTGGGGTTAGTGCTGGCGATCCTG 264 

MIIIIIIIIIIMMIIMIIIIIIIIIIinillllllilllllllllllllllllll 
0y 1 atgcgcggcgttggctggcagatgctgtccctgtcgctggggttagtgctggcgatcctg 60 

Db 265 AACAAGGTGGCACCGCAGGCGTGCCCGGCGCAGTGCTCTTGCTCGGGCAGCACAGTGGAC 324 

llllllllilllMIIIIIIIIIIMIIIIimiMillllllllllMIIIIIIIIII 
Qy 61 aacaaggtggcaccgcaggcgtgcccggcgcagtgctcttgctcgggcagcacagtggac 120 

Db 325 TGTCACGGGCTGGCGCTGCGCAGCGTGCCCAGGAATATCCCCCGCAACACCGAGAGACTG 384 

llllllllilllllllllllllllllllllllllllMIIIIIIIIIIIIIIIIIIIIII 
Qy 121 tgtcacgggctggcgctgcgcagcgtgcccaggaatatcccccgcaacaccgagagactg 180 

Db 385 GATTTAAATGGAAATAACATCACAAGAATTACGAAGACAGATTTTGCTGGTCTTAGACAT 444 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Qy 181 gatttaaatggaaataacatcacaagaattacgaagacagattttgctggtcttagacat 240 

Db 445 CTAAGAGTTCTTCAGCTTATGGAGAATAAGATTAGCACCATTGAAAGAGGAGCATTCCAG 504 

Qy 241 ctaagagttcttcagcttatggagaataagattagcaccattgaaagaggagcattccag 300 

Db 505 GATCTTAAAGAACTAGAGAGACTGCGTTTAAACAGAAATCACCTTCAGCTGTTTCCTGAG 564 

IIIIIIIIIMIIIIIIIIIIIIIMIIjlllllllllMIIIMIMIIIMIIIIIII 
Qy 301 gatcttaaagaactagagagactgcgtttaaacagaaatcaccttcagctgtttcctgag 360 

Db 565 TTGCTGTTTCTTGGGACTGCGAAGCTATACAGGCTTGATCTCAGTGAAAACCAAATTCAG 624 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Qy 361 ttgctgtttcttgggactgcgaagctatacaggcttgatctcagtgaaaaccaaattcag 420 

Db 625 GCAATCCCAAGGAAAGCTTTCCGTGGGGCAGTTGACATAAAAAATTTGCAACTGGATTAC 684 

IIIMIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMII! 
Qy 421 gcaatcccaaggaaagctttccgtggggcagttgacataaaaaatttgcaactggattac 480 

Db 685 AACCAGATCAGCTGTATTGAAGATGGGGCATTCAGGGCTCTCCGGGACCTGGAAGTGCTC 744 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Qy 481 aaccagatcagctgtattgaagatggggcattcagggctctccgggacctggaagtgctc 540 

Db 745 ACTCTCAACAATAACAACATTACTAGACTTTCTGTGGCAAGTTTCAACCATATGCCTAAA 804 

IIMIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIIIIMIII 
Qy 541 actctcaacaataacaacattactagactttctgtggcaagtttcaaccatatgcctaaa 600 

Db 805 CTTAGGACTTTTCGACTGCATTCAAACAACCTGTATTGTGACTGCCACCTGGCCTGGCTC 864 
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll - 
Qy 601 cttaggacttttcgactgcattcaaacaacctgtattgtgactgccacctggcctggctc 660 

Db 865 TCCGACTGGCTTCGCCAAAGGCCTCGGGTTGGTCTGTACACTCAGTGTATGGGCCCCTCC 924 

iiiiMiiiiiini iiiiiiiiiiimiiimiiiiiiiiiMiMiiiiiiiiii 

Qy 661 tccgactggcttcgcaaaaggcctcgggttggtctgtacactcagtgtatgggcccctcc 720 

Db 925 CACCTGAGAGGCCATAATGTAGCCGAGGTTCAAAAACGAGAATTTGTCTGCAGTG 979 

IIIII1IIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIII 
Qy 721 cacctgagaggccataatgtagccgaggttcaaaaacgagaatttgtctgcagtgatgag 780 

Db 980 GTCACCAGTCATTTATGGCTCCTTCTTGTAGTGTTTTGCACTGCCCTGCCGCC 1032 

llllllllllllllllimillllllllllllllllllllllllllllllll 
Qy 781 gaagaaggtcaccagtcatttatggctccttcttgtagtgttttgcactgccctgccgcc 840 

Db 1033 TGTACCTGTAGCAACAATATCGTAGACTGTCGTGGGAAAGGTCTCACTGAGATCCCCACA 1092 

MIIIIMIIIIIIIIIIMIIIIIIIIIIItlMMIIIIIIIIIIIIIIIIIIIIIII 
Qy 841 tgtacctgtagcaacaatatcgtagactgtcgtgggaaaggtctcactgagatccccaca 900 

Db 1093 AATCTTCCAGAGACCATCACAGAAATACGTTTGGAACAGAACACAATCAAAGTCATCCCT 1152 

imiiimiiiiiiiiiiiiiiiiiiiiimmiiiiimmmimiim 

Qy 901 aatcttccagagaccatcacagaaatacgtttggaacagaacacaatcaaagtcatccct 960 
Db 1153 CCTGGAGCTTTCTCACCATATAAAAAGCTTAGACGAATTGACCTGAGCAATAATCAGATC 1212 

MIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIMII 
Qy 961 cctggagctttctcaccatataaaaagcttagacgaattgacctgagcaataatcagatc 1020 
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Db 1213 TCTGAACTTGCACCAGATGCTTTCCAAGGACTACGCTCTCTGAATTCACTTGTCCTCTAT 1272 

Oy 1021 tctgaacttgcaccagatgctttccaaggactacgctctctgaattcacttgtcctctat 1080 

Db 1273 GGAAATAAMTCACAGAACTCCCCAAAAGTTTATTTGAAGGACTGTTTTCCTTACAGCTC 1332 

Oy 1081 ggaaataaaatcacagaactccccaaaagtttatttgaaggactgttttccttacagctc 1140 

Db 1333 CTATTATTGAATGCCAACAAGATAAACTGCCTTCGGGTAGATGCTITTCAGGATCTCCAC 1392 

Qy 1141 ctattattgaatgccaacaagataaactgccttcgggtagatgcttttcaggatctccac 1200 

Db 1393 AACTTGAACCTTCTCTCCCTATATGACMCAAGCTTCAGACCATCGCCAAGGGGACCTTT 1452 

Qy 1201 aacttgaaccttctctccctatatgacaacaagcttcagaccatcgccaaggggaccttt 1260 

Db 1453 TCACCTCTTCGGGCCATTCAAACTATGCATTTGGCCCAGAACCCCTTTATTTGTGACTGC 1512 

Qy 1261 tcacctcttcgggccattcaaactatgcatttggcccagaacccctttatttgtgactgc 1320 

jj^ 1513 CATCTCAAGTGGCTAGCGGATTATCTCCATACCAACCCGATTGAGACCAGTGGTGCCCGT 1572 

1321 catctcaagtggctagcggattatctccataccaacccgattgagaccagtggtgcccgt 1380 

Db 1573 TGCACCAGCCCCCGCCGCCTGGCAMCAAAAGAATTGGACAGATCAAAAGCAAGAMTTC 1632 

Qy 1381 tgcaccagcccccgccgcctggcaaacaaaagaattggacagatcaaaagcaagaaattc 1440 

Db 1633 CGTTGTTCAGCTAAAGAACAGTATTTCATTCCAGGTACAGAAGATTATCGATCAAMTTA 1692 

Qy 1441 cgttgttcag gtacagaagattatcgatcaaaatta 1476 

Db 1693 AGTGGAGACTGCTTTGCGGATCTGGCTTGCCCTGAAAAGTGTCGCTGTGAAGGAACCACA 1752 

Qy 1477 agtggagactgctttgcggatctggcttgccctgaaaagtgtcgctgtgaaggaaccaca 1536 

Db 1753 GTAGATTGCTCTAATCAAAAGCTCAACAAAATCCCGGAGCACATTCCCCAGTACACTGCA 1812 

Qy 1537 gtagattgctctaatcaaaagctcaacaaaatcccggagcacattccccagtacactgca 1596 

Db 1813 GAGTTGCGTCTCAATAATAATGMTTTACCGTGTTGGMGCCACAGGAATCTTTAAGAAA 1872 

Qy 1597 gagttgcgtctcaataataatgaatttaccgtgttggaagccacaggaatctttaagaaa 1656 

Db 1873 CTTCCTCAATTACGTAAAATAMCTTTAGCAACAATAAGATCACAGATATTGAGGAGGGA 1932 

Qy 1657 cttcctcaattacgtaaaataaactttagcaacaataagatcacagatattgaggaggga 1716 

» 1933 GCATTTGAAGGAGCATCTGGTGTAMTGAAATACTTCTTACGAGTAATCGTTTGGAAAAT 1992 

1717 gcatttgaaggagcatctggtgtaaatgaaatacttcttacgagtaatcgtttggaaaat 1776 

Db 1993 GTGCAGCATAAGATGTTCAAGGGATTGGAAAGCCTCAAAACTTTGATGTTGAGAAGCAAT 2052 

Qy 1777 gtgcagcataagatgttcaagggattggaaagcctcaaaactttgatgttgagaagcaat 1836 

Db 2053 CGAATAACCTGTGTGGGGAATGACAGTTTCATAGGACTCAGTTCTGTGCGTTTGCTTTCT 2112 

Qy 1837 cgaataacctgtgtggggaatgacagtttcataggactcagttctgtgcgtttgctttct 1896 

Db 2113 TTGTATGATAATCAAAITACIACAGTTGCACCAGGGGCATTTGATACTCTCCATTCTTTA 2172 

Qy 1897 ttgtatgataatcaaattactacagttgcaccaggggcatttgatactctccattcttta 1956 

Db 2173 TCTACTCTAAACCTCTTGGCCAATCCTTTTAACTGTAACTGCTACCTGGCTTGGTTGGGA 2232 

Qy 1957 tctactctaaacctcttggccaatccttttaactgtaactgctacctggcttggttggga 2016 

Db 2233 GAGTGGCTGAGAAAGAAGAGAATTGTCACGGGAAATCCTAGATGTCAAAMCCATACTTC 2292 

Qy 2017 gagtggctgagaaagaagagaattgtcacgggaaatcctagatgtcaaaaaccatacttc 2076 

Db 2293 CTGAMGAAATACCCATCCAGGATGTGGCCATTCAGGACTTCACTTGTGATGACGGAAAT 2352 



Qy 


207 


Db 


235 


oy 


213 


Db 


241 


Qy 


219 


Db 


247 


Qy 


225 


Db 


253 


Qy 


231 


Db 


259 


Qy 


237 


Db 


265 


Qy 


243 


Db, 


271 


Qy 


249 


Db 


277 


Qy 


255 


Db 


283 


Qy 


261 


Db 


28S 


Qy 


267 


Db 


295 


Qy 


273 


Db 


301 


Qy 


27S 


Db 


307 


Qy 


285 


Db 


313 


Qy 


291 


Db 


31 c 


Qy 


297 


Db 


32. 


Qy 


30. 


Db 


333 


Qy 


30 


Db 


331 



7 ctgaaagaaatacccatccaggatgtggccattcaggacttcacttgtgatgacggaaat 2136 
i3 GATGACAATAGTTGCTCCCCACTTTCTCGCTGTCCTACTGAATGTACTTGCTTGGATACA 2412 

17 gatgacaatagttgctccccactttctcgctgtcctactgaatgtacttgcttggataca 2196 

.3 GTCGTCCGATGTAGCAACAAGGGTTTGAAGGTCTTGCCGAAAGGTATICCAAGAGATGTC 2472 

17 gtcgtccgatgtagcaacaagggtttgaaggtcttgccgaaaggtattccaagagatgtc 2256 

'3 ACAGAGTTGTATCTGGATGGAAACCAATTTACACTGGTTCCCAAGGAACTCTCCAACTAC 2532 

17 acagagttgtatctggatggaaaccaatttacactggttcccaaggaactctccaactac 2316 

13 AAACATITAACACTTATAGACTTAAGTAACAACAGAATAAGCACGCTTTCTAATCAGAGC 2592 

.7 aaacatttaacacttatagacttaagtaacaacagaataagcacgctttctaatcagagc 2376 

13 TTCAGCAACATGACCCAGCTCCTCACCTTAATTCTTAGTTACAACCGTCTGAGATGTA1T 2652 

7 ttcagcaacatgacccagctcctcaccttaattcttagttacaaccgtctgagatgtatt 2436 

i3 CCTCCTCGCACCTTTGATGGATTAAAGTCTCTTCGATTACTTTCTCTACATGGAAATGAC 2712 

17 cctcctcgcacctttgatggattaaagtctcttcgattactttctctacatggaaatgac 2496 

.3 ATTTCTGTTGTGCCTGAAGGTGCTTTCAATGATCTTTCTGCATTATCACATCTAGCAATT 2772 

17 atttctgttgtgcctgaaggtgctttcaatgatctttctgcattatcacatctagcaatt 2556 

' 3 GGAGCCAACCCTCTTTACTGTGATTGTAACATGCAGIGGTTATCCGACTGGGTGAAGTCG 2832 

i7 ggagccaaccctctttactgtgattgtaacatgcagtggttatccgactgggtgaagtcg 2616 

3 GAATATAAGGAGCCTGGAATTGCTCGTIGTGCTGGTCCTGGAGAMTGGCAGATAAACTT 2892 

.7 gaatataaggagcctggaattgctcgttgtgctggtcctggagaaatggcagataaactt 267 6 

3 TTACTCACAACTCCCTCCAAAAAATTTACCTGTCAAGGTCCTGTGGATGTCAATATTCTA 2952 

7 ttactcacaactccctccaaaaaatttacctgtcaaggtcctgtggatgtcaatattcta 2736 

3 GCTAAGTGTAACCCCTGCCTATCAAATCCGTGTAAAAATGATGGCACATGTAATAGTGAT 3012 

17 gctaagtgtaacccctgcctatcaaatccgtgtaaaaatgatggcacatgtaatagtgat 2796 

3 CCAGTTGACTTTTACCGATGCACCTGTCCATATGGTTTCAAGGGGCAGGACTGTGATGTC 3072 

17 ccagttgacttttaccgatgcacctgtccatatggtttcaaggggcaggactgtgatgtc 2856 

'3 CCAATTCATGCCTGCATCAGTAACCCATGTAAACATGGAGGAACTTGCCACTTAAAGGAA 3132 
i7 ccaattcatgcctgcatcagtaacccatgtaaacatggaggaacttgccacttaaaggaa 2916 
13 GGAGAAGAAGATGGATTCTGGTGTATTTGTGCTGATGGATTTGAAGGAGAAAATTGTGAA 3192 
.7 ggagaagaagatggattctggtgtatttgtgctgatggatttgaaggagaaaattgtgaa 2976 
)3 GTCAACGTTGATGATTGTGAAGATAATGACTGTGAAAATAATTCTACATGTGTCGATGGC 3252 
'7 gtcaacgttgatgattgtgaagataatgactgtgaaaataattctacatgtgtcgatggc 3036 
>3 ATTAATAACIACACATGCCTTTGCCCACCTGAGTATACAGGTGAGTTGTGTGAGGAGAAG 3312 
17 attaataactacacatgcctttgcccacctgagtatacaggtgagttgtgtgaggagaag 3096 
L 3 CTGGACTTCTGTGCCCAGGACCTGAACCCCIGCCAGCACG ATTCAAAGTGCATCCT AACT 3372 
)7 ctggacttctgtgcccaggacctgaacccctgccagcacgattcaaagtgcatcctaact 3156 
3 CCAMGGGATTCAAATGTGACTGCACACCAGGGTACGTAGGTGAACACTGCGACATCGAT 3432 
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Qy 3157 

Db 3433 

Qy 3217 

Db 3493 

Qy 3277 

Db 3553 

Qy 3337 

Db 3613 

• 3397 
3673 

Qy 3457 

Db 3733 

Qy 3517 

Db 3793 

Qy 3577 

Db 3853 

Qy 3637 

Db 3913 

Qy 3697 

Db 3973 

Qy 3757 

Db 4033 

#3817 
4093 

Qy 3877 

Db 4153 

Qy 3937 

Db 4213 

Qy 3997 

Db 4273 

Qy 4057 

Db 4333 

Qy 4117 

Db 4393 

Qy 4177 

Db 4453 

Qy 4237 



ccaaagggattcaaatgtgactgcacaccagggtacgtaggtgaacactgcgacatcgat 3216 
TTTGACGACTGCCAAGACAACAAGTGTAAAAACGGAGCCCACTGCACAGATGCAGTGAAC 3492 
tttgacgactgccaagacaacaagtgtaaaaacggagcccactgcacagatgcagtgaac 3276 
GGCTATACGTGCATATGCCCCGAAGGTTACAGTGGCTTGTTCTGTGAGTTTTCTCCACCC 3552 
ggctatacgtgcatatgccccgaaggttacagtggcttgttctgtgagttttctccaccc 3336 
ATGGTCCTCCCTCGTACCAGCCCCTGTGATAATTTTGATTGTCAGAATGGAGCTCAGTGT 3612 
atggtcctccctcgtaccagcccctgtgataattttgattgtcagaatggagctcagtgt 3396 
ATCGTCAGAATAAATGAGCCAATATGTCAGTGTTTGCCTGGCTATCAGGGAGAAAAGTGT 3672 
atcgtcagaataaatgagccaatatgtcagtgtttgcctggctatcagggagaaaagtgt 3456 
GAAAAATTGGTTAGTGTGAATTTTATAAACAAAGAGTCTIAICITCAGATTCCTTCAGCC 3732 

gaaaaattggttagtgtgaattttataaacaaagagtcttatcttcagattccttcagcc 3516 
AAGGTTCGGCCTCAGACGAACATAACACTICAGATIGCCACAGATGAAGACAGCGGAATC 3792 
aaggttcggcctcagacgaacataacacttcagattgccacagatgaagacagcggaatc 3576 
CTCCTGTATAAGGGTGACAAAGACCATATCGCGGTAGAACTCTATCGGGGGCGTGTTCGT 3852 
ctcctgtataagggtgacaaagaccatatcgcggtagaactctatcgggggcgtgttcgt 3636 
GCCAGCTATGACACCGGCTCTCATCCAGCTTCTGCCATTTACAGTGTGGAGACAATCAAT 3912 
gccagctatgacaccggctctcatccagcttctgccatttacagtgtggagacaatcaat 3696 
GATGGAAACTTCCACATTGTGGAACTACTTGCCTTGGATCAGAGTCTCTCTTTGTCCGTG 3972 
gatggaaacttccacattgtggaactacttgccttggatcagagtctctctttgtccgtg 3756 
GATGGTGGGAACCCCAAAATCATCACTAACTTGTCAAAGCAGTCCACTCTGAATTTTGAC 4032 
gatggtgggaaccccaaaatcatcactaacttgtcaaagcagtccactctgaattttgac 3816 
TCTCCACTCTATGTAGGAGGCATGCCAGGGAAGAGIAACGTGGCATCTCTGCGCCAGGCC 4092 
tctccactctatgtaggaggcatgccagggaagagtaacgtggcatctctgcgccaggcc 3876 
CCTGGGCAGAACGGAACCAGCTTCCACGGCTGCATCCGGAACCTTTACATCAACAGTGAG 4152 
cctgggcagaacggaaccagcttccacggctgcatccggaacctttacatcaacagtgag 3936 
CTGCAGGACTTCCAGAAGGTGCCGATGCAAACAGGCATTTTGCCTGGCTGTGAGCCATGC 4212 
ctgcaggacttccagaaggtgccgatgcaaacaggcattttgcctggctgtgagccatgc 3996 
CACAAGAAGGTGTGTGCCCATGGCACATGCCAGCCCAGCAGCCAGGCAGGCTTCACCTGC 4272 

cacaagaaggtgtgtgcccatggcacatgccagcccagcagccaggcaggcttcacctgc 4056 

GAGTGCCAGGAAGGATGGATGGGGCCCCTCTGTGACCAACGGACCAATGACCCTTGCCTT 4332 
gagtgccaggaaggatggatggggcccctctgtgaccaacggaccaatgacccttgcctt 4116 
GGAAATAAATGCGTACATGGCACCTGCTTGCCCATCAATGCGTTCTCCTACAGCTGTAAG 4392 
ggaaataaatgcgtacatggcacctgcttgcccatcaatgcgttctcctacagctgtaag 4176 
TGCTTGGAGGGCCATGGAGGTGTCCTCTGTGATGAAGAGGAGGATCTGTTTAACCCATGC 4452 
tgcttggagggccatggaggtgtcctctgtgatgaagaggaggatctgtttaacccatgc 4236 
CAGGCGATCAAGTGCAAGCACGGGAAGTGCAGGCTTTCAGGTCTGGGGCAGCCCTACTGT 4512 
caggcgatcaagtgcaagcatgggaagtgcaggctttcaggtctggggcagccctactgt 4296 



Db 4513 GAATGCAGCAGTGGATACACGGGGGACAGCTGTGATCGAGAAATCTCTTGTCGAGGGGAA 4572 

Qy 4297 gaatgcagcagtggatacacgggggacagctgtgatcgagaaatctcttgtcgaggggaa 4356 

Db 4573 AGGATAAGAGATTATTACCAAAAGCAGCAGGGCTATGCTGCTTGCCAAACAACCAAGAAG 4632 

Qy 4357 aggataagagattattaccaaaagcagcagggctatgctgcttgccaaacaaccaagaag 4416 

Db 4633 GTGTCCCGATTAGAGTGCAGAGGTGGGTGTGCAGGAGGGCAGTGCTGTGGACCGCTGAGG 4692 

Qy 4417 gtgtcccgattagagtgcagaggtgggtgtgcaggagggcagtgctgtggaccgctgagg 4476 

Db 4693 AGCAAGCGGCGGAAATACTCTTTCGAATGCACTGACGGCTCCTCCTTTGTGGACGAGGTT 4752 

Qy 4477 agcaagcggcggaaatactctttcgaatgcactgacggctcctcctttgtggacgaggtt 4536 

Db 4753 GAGAAAGTGGTGAAGTGCGGCTGTACGAGGTGTGTGTCCTAAACACACTCCCGGCAGCTC 4812 

Qy 4537 gagaaagtggtgaagtgcggctgtacgaggtgtgtgtcctaaacacactcccggcagctc 4596 

Db 4813 TGTCTTTGGAAAAGGTTGTATACTTCTTGACCATGTGGGACTAATGAATGCTTCAIAGTG 4872 

Qy 4597 tgtctttggaaaaggttgtatacttcttgaccatgtgggactaatgaatgcttcatagtg 4656 

Db 4873 GAAATATTTGAAATATATTGIAAAATACAGAACAGACTTATTITTATTATGAGAATAMG 4932 

Qy 4657 gaaatatttgaaatatattgtaaaatacagaacagacttatttttattatgagaataaag 4716 

Db 4933 ACTTTTTTTCTGCATTTG 4950 

Qy 4717 actttttttctgcatttg 4734 



LOCUS 

DEFINITION 

ACCESSION 
NID 

VERSION 
KEYWORDS 
SOURCE 
ORGANISM 



AF074960 5121 bp mRNA ROD 04-MAR-1999 

Mus musculus neurogenic extracellular slit protein (Slit2) mRNA, 

partial cds. 

AF074960 

g4151258 

AF07496Q.1 GI:4151258 

house mouse. 
Mus musculus 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
1 (bases 1 to 5121) 

Holmes, G.P., Negus, K. , Burridge,L., Raman, S., Algar,E., Yamada,T. 
and Little, m.h. 

Distinct but overlapping expression patterns of two vertebrate slit 
homologs implies functional roles in CNS development and 



AUTHORS 
TITLE 



JOURNAL Mech. Dev. 79 (1-2), 57-72 (1998) 
REFERENCE 2 (bases 1 to 5121) 

AUTHORS Holmes, G. P., Negus,K,, Smith, K., Yamada,T, and Little, M.H. 
title Direct Submission 

JOURNAL Submitted (29-JUN-1998) Center for Molecular and Cellular Biology, 
University of Queensland, St. Lucia, Brisbane, Queensland 4072, 
Australia 

features Location/Qualifiers 
source 1. .5121 

/organ ism= "Mus musculus" 

/db_xref="taxon; 10090" 
gene 1. .5121 

/gene-"Slit2" 
CDS <1. .3078 

/gene-"Slit2" 

/note-" similar to Homo sapiens SLIT2; similar to 

Drosophila slit family" 

/codon_start-l 

/product- "neurogenic extracellular slit protein" 

/protein_id="AAD04345.1" 

/db.xref-"PID:g4151259" 
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/db_xref-"Gl: 4151259" 

/translation o "ACPEKCRCEGTTVDCSNQRLNKIPDHIPQYIAELRLNNNEFTVL 
EATGIFKKLPQLRXINFSNNKITDIEEGAFEGASGVNEILLTSNRLENVQHKMFKGLE 
SLKTLMLRSNRISCVGNDSFIGLGSVRLLSLYDNQITTVAPGAFDXLHSLSTLNLLAN 
PFNCNCHLAWLGEWLRRKRIVTGNPRCQKPYFLKEIFIQDVAIQDFTCDDGNDDNSCS 
PLSRCPSECTCLDTXVRCSNKGLKVLPKGIPKDVTELYLDGNQFTLVPKELSNYKHLT 
LIDLSNNRISTLSNQXFSNMTQLLTLILSYNRLRCIPPRTFDGLKSLRLLSLHGNDIS 
WPEGAFNDLSALSHLAIGANPLYCDCNMQWLSDWVKSEYKEPGIARCAGPGEMADKL 
LLTTPSKKFTCQGPMDITIQAKCNPCLSNPCKNDGTCNNDPVDFYRCTCPYGFKGQDC 
DVPIHACISNPCKHGGTCHLKEGENAGFWCTCADGFEGENCEVNIDDCEDNDCENNST 
CVDGINNYTCLCPPEYTGELCEEKLDFCAQDLNPCQHDSKCILTPKGFKCDCTPGYIG 
EHCDIDFDDCQDNKCKNGAHCTDAVNGYTCVCPEGYSGLFCEFSPPMVLPRTSPCDNF 
DCQNGAQCIIRINEPICQCLPGYLGEKCEKLVSVNFVNKESYLQIPSAKVRPOTNITL 
QIATDEDSGILLYKGDKDHIAVELYRGRVRASYDTGSHPASAIYSVETINDGNFHIVE 
LLTLDSSLSLSVDGGSPKVITNLSKQSTLNFDSPLYVGGMPGKNNVASLRQAPGQNGT 
SFHGCIRNLYINSELQDFRKMPMQTGILPGCEPCHKKVCAHGMCQPSSQSGFTCECEE 
GM3PLCDQRTNDPCLGNKCVHGTCLPINAFSYSCKCLEGHGGVLCDEEEDLFNPCQM 
IKCKHGKCRLSGVGQPYCECNSGFTGDSCDREISCRGERIRDYYQKQQGYAACQTTKK 
VSRLECRGGCAGGQCCGPLRSKRRKYSFECTDGSSFVDEVEKWRCGCARCAS" 
miscjeature 4, ,96 

/gene-"Slit2": 

/note="Region: amino- flanking" 
miscjeature 100. .132 

/gene»"Slit2" 

/note* "Reg ion: leucine rich-region 3-1" 
miscjeature 148. .207 

/gene="Slit2" 

/note-"Region: leucine rich-region 3-1A" 
miscjeature 220. .279 

/gene-"Slit2" 

/note- "Region: leucine rich-region 3-2" 
miscjeature 292. .351 

/gene-"Slit2" 

/note-*Region: leucine rich-region 3-3" 
miscjeature 364. .453 

/gene-"Slit2" 

./note- "Reg ion: leucine rich-region 3-4" 
miscjeature 436. .471 

/gene-"Slit2" 

/note- "Reg ion: leucine rich-region 3-5" 
miscjeature 478. .666 

/gene-"Slit2 n 

/note- "Reg ion: carboxy-flanking" 
miscjeature 667, .759 

/gene="Slit2" 

/note- "Reg ion: amino- flanking" 
miscjeature 763, .795 

/gene-"Slit2" 

/note- "Reg ion: leucine rich region 4-1" 
miscjeature 805. .864 

/gene»"Slit2" 

/note- "Reg ion: leucine rich region 4-2" 
miscjeature 877. .936 

/gene-"Slit2" 

/note="Region: leucine rich region 4-3" 
miscjeature 949, .1008 
/gene-"Slit2" 

/note- "Reg ion; leucine rich region 4-4" 
miscjeature 1021. .1056 
/gene-"Slit2" 

/note- "Reg ion; leucine rich region 4-5" 
miscjeature 1063. .1236 
/gene-"Slit2" 

/note-" Region: carboxy-flanking" 
miscjeature 1240. .1353 
/gene="Slit2" 

/note- "Reg ion: EGF-like repeat 1" 
miscjeature 1357. .1476 
/gene="Slit2" 

/note- "Reg ion: EGF-like repeat 2" 
miscjeature 1480. .1590 
/gene°"Slit2" 

/note- "Reg ion; EGF-like repeat 3" 



miscjeature 1594. ,1710 
/gene-"Slit2" 

/note- "Region: EGF-like repeat 4" 
miscjeature 1714. .1824 
/gene="Slit2" 

/note- "Region: EGF-like repeat 5" 
miscjeature 1849, ,1959 
■ /gene-"Slit2" 

/note- "Region: EGF-like repeat 6" 
miscjeature 1960. ,2475 

/gene-'"Slit2« 

/note- "Region: agrin perlecan laminin slit motif" 
miscjeature 2476. .2592 
/gene-"Slit2" 

/note- "Region; EGF-like repeat 6A" 
miscjeature 2599. .2709 
/gene="Slit2" 

/note- "Region: EGF-like repeat 7" 
miscjeature -2722, ,2832 
/gene-"Slit2" 

/note- "Region: EGF-like repeat 8" 
miscjeature 2932. .3075 

/gene-"Slit2" 

/note- "Region: CTC knot" 
BASE COUNT 1533 a 1060 c 1106 g 1418 t 4 others 
ORIGIN 

Query Match 52.5%; Score 2498; DB 32; Length 5121; 

Best Local Similarity 88.6%; Pred. No. 0.00e+00; 

Matches 2883; Conservative 3; Mismatches 364; Indels 4; Gaps 2; 

Db 1 GCTTGTCCTGAGAAGTGTCGCTGTGAAGGGACCACAGTAGACTGCTCCAATCAAAGACTC 60 . 

urn inn iiiiiiiiiiiiiiiii minim inn imiii in 

Qy 1501 gcttgccctgaaaagtgtcgctgtgaaggaaccacagtagattgctctaatcaaaagctc 1560 

Db 61 AACAAAATCCCTGACCATATTCCCCAGTACACAGCAGAGCTGCGTCTCAATAATAATGAA 120 

: 1 1 II II llllllllllllll HUM llllllllllllllllllll 

Qy 1561 aacaaaatcccggagcacattccccagtacactgcagagttgcgtctcaataataatgaa 1620 

Db 121 TTCACAGTGTTAGAAGCCACGGGAATATTTAAGAAACTTCCTCAGTTACGTRAAATCAAC 180 

II II lllll llllllll lllll IIIIIIIIIIIIIIIII 111111:1111 III 
Qy 1621 tttaccgtgttggaagccacaggaatctttaagaaacttcctcaattacgtaaaataaac 1680 

Db 181 TTTAGCAACAATAAGATCACGGATAICGAGGAGGGTGCATTTGAAGGTGCGTCTGGTGTG 240 

imiimiiiiiimii iiiii iiiiiiii Milium || iiiinii 

Qy 1681 tttagcaacaataagatcacagatattgaggagggagcatttgaaggagcatctggtgta 1740 

Db 241 AATGAAATTCTTCTCACCAGTAACCGTTTGGAAAATGTTCAGCATAAGATGTTCAAAGGA 300 
llllllll lllll M lllll llllllllllllll IIIIIIIIIIIIIIIII III 

Qy 1741 aatgaaatacttcttacgagtaatcgtttggaaaatgtgcagcataagatgttcaaggga 1800 
Db 301 CTGGAGAGCCTCAAAACATTGATGCTGAGAAGTAATCGAATAAGCTGTGTTGGGAACGAC 360 

mi minimi mm 1111111 minium mm inn 111 

1801 ttggaaagcctcaaaactttgatgttgagaagcaatcgaataacctgtgtggggaatgac 1860 



361 AGTTTCATAGGACTCGGCTCTGTGCGTCTGCTCTCTTTATATGACAATCAAATTACCACA 420 

iiiiiiniiimi i m 1 1 1 1 1 1 1 mi inn mi iiiiiiiiiii hi 

1861 agtttcataggactcagttctgtgcgtttgctttctttgtatgataatcaaattactaca 1920 

421 GTGGCACCAGGAGCATTTGATYCTCTCCATTCATTATCCACTCTAAACCTCTTGGCCAAT 480 

II llllllll lllllllll llllllllll lllll mmiiinimiiini 
1921 gttgcaccaggggcatttgatactctccattctttatctactctaaacctcttggccaat 1980 

481 CCTTTCAACTGTAACTGTCACCTGGCATGGCTGGGAGAATGGCTCAGAAGGAAAAGAATT 540 

iiiii iiiiiiiiiii mini in mini inn nn in mm 

1981 ccttttaactgtaactgctacctggcttggttgggagagtggctgagaaagaagagaatt 2040 

541 GTAACAGGAAATCCTCGATGCCAAAAACCCTACTTCCTGAAGGAAATCCCAATCCAGGAT 600 

II II lllllllll llll llllllll IIIIIIIIIII lllll II lllllllll 
2041 gtcacgggaaatcctagatgtcaaaaaccatacttcctgaaagaaatacccatccaggat 2100 

601 GTAGCCATTCAGGACTTCACCTGTGATGATGGAAATGATGACAATAGTTGCTCTCCACTC 660 
II llllinilllllllll llllllll lllllllllllllllllllllll lllll 
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Qy 2101 

Db 661 

Qy 2161 

Db 721 

Qy 2221 

Db 781 

Qy 2281 

Db 841 

• 2341 
901 

Qy 2401 

Db 961 

Qy 2461 

Db 1021 

Qy 2521 

Db 1081 

Qy 2581 

Db 1141 

Qy 2641 

Db 1201 

Qy 2701 

Db 1261 

# 2761 
1321 

Qy 2821 

Db 1381 

Qy 2881 

Db 1441 

Qy 2941 

Db 1501 

Qy 3001 

Db 1561 

Qy 3061 

Db 1621 

Qy 3121 

Db 1681 

Qy ■ 3181 



gtggccattcaggacttcacttgtgatgacggaaatgatgacaatagttgctccccactt 2160 

TCCCGTTGK 
I II III 



TTGMGGTCTTGCCTAMGGTATTCCAAAAGATGTCACAGAGCTGTATCTGGATGGGAAC 780 
ttgaaggtcttgccgaaaggtattccaagagatgtcacagagttgtatctggatggaaac 2280 
CAGTTTACGCTGGK 



ACCTTAATCCTCAGTTACAACCGICTGAGATGTATCGCICCACGAACCTTIGATGGATTG 
Ml II 



lllll Ml 

aagtctcttcgattactttctctacatggaaatgacatttctgttgtgcctgaaggtgct 
TTCAATGACTTGTCAGCCTTGTCACACTTAGCGATTGGAGCCAACCCTCTTTACTGTGAT 



TGCCCATATGGATTCAAGGGTCAGGACTGTGATGTCCCCATTCATGCTTGTATCAGTAAT 1380 



tgtccatatggtttcaaggggcaggactgtgatgtcccaattcatgcctgcatcagtaac 2880 
CCATGTAAACATGGAGGAACTTGTCACTTAAAGGAAGGAGAGAATGCTGGATTCTGGTGC 1440 



2160 








Db 


1741 


720 








Qy 


3241 


2220 








Db 


1801 


780 








Qy 


3301 


2280 








Db 


1861 


840 








Qy 


3361 


2340 








Db 


1921 


900 








Qy 


3421 


2400 








Db 


1981 


960 


Qy 


3481 


2460 








' Db 


2041 


1020 








Qy 


3541 


2520 








Db 


2101 


1080 


Qy 


3601 


2580 








Db 


2161 


1140 








Qy 


3661 


2640 








Db 


2221 


1200 








Qy 


3721 


2700 








Db 


2281 


1260 








Qy 


3781 


2760 








Db 


2341 


1320 








Qy 


3841 


2820 








Db 


2401 


1380 








Qy 


3901 


2880 








Db 


2461 


1440 








Qy 


3961 


2940 








Db 


2521 


1500 








Qy 


4021 


3000 








Db 


2581 


1560 








Qy 


4081 


3060 








Db 


2641 


1620 








Qy 


4141 


3120 








Db 


2701 


1680 








Qy 


4201 


3180 








Db 


2761 


1740 








Qy 


4261 


3240 







L TGTAAAAACGGTGCTCACIGCACAGATGCCGTGAACGGATACACGTGCGTCTGTCCTGAA 1800 

L tgtaaaaacggagcccactgcacagatgcagtgaacggctatacgtgcatatgccccgaa 3300 

I GGCTACAGTGGCTTGTTCTGTGAGTTTTCTCCACCCATGGTCCTCCCTCGCACCAGCCCC 1860 

1 ggttacagtggcttgttctgtgagttttctccacccatggtcctccctcgtaccagcccc 3360 

I TGTGATAATTT1GATTGCCAGAATGGAGCCCAGTGTATCAICAGGATAAATGAACCAATA 1920 

L tgtgataattttgattgtcagaatggagctcagtgtatcgtcagaataaatgagccaata 3420 

I TGCCAGTGTTTGCCTGGCTACCTGGGAGAGAAGTGTGAGAMTTGGTCAGTGTGAATTTT 1980 

i nun mum inmii mimiiiii 

L tgtcagtgtttgcctggctatcagggagaaaagtgtgaaaaattggttagtgtgaatttt 3480 

L GTAAACAAAGAGTCCTATCTTCAGATTCCTTCAGCCAAGGTTCGGCCTCAGACAAACATC 2040 

L ataaacaaagagtcttatcttcagattccttcagccaaggttcggcctcagacgaacata 3540 

L ACACTTCAGATTGCCACAGATGAAGACAGCGGCATCCTCTTGTATAAAGGTGACAAAGAC 2100 

mill mini miiiiimi 

L acacttcagattgccacagatgaagacagcggaatcctcctgtataagggtgacaaagac 3600 

1 CACATTGCCGTGGAACTCTATCGAGGGCGAGTTCGAGCCAGCTATGACACCGGCTCTCAT 2160 

L catatcgcggtagaactctatcgggggcgtgttcgtgccagctatgacaccggctctcat 3660 

1 CCGGCTTCTGCCATTTACAGTGTGGAGACAATCAATGAIGGAAACTTCCACATTGTGGAG 2220 

L ccagcttctgccatttacagtgtggagacaatcaatgatggaaacttccacattgtggaa 3720 

1 CTACTGACCCTGGATTCCAGTCTTTCCCTCTCTGTGGATGGAGGAAGCCCTAAAGTCATC 2280 

I II MINI || | m in inn 

L ctacttgccttggatcagagtctctctttgtccgtggatggtgggaaccccaaaatcatc 3780 

L ACCAATTTGTCAAAACAATCTACTCTGAATTTCGACTCTCCACTCTATGTAGGAGGCATG 2340 

L actaacttgtcaaagcagtccactctgaattttgactctccactctatgtaggaggcatg 3840 

1 CCTGGGAAAAATAACGTGGCATCCCTGCGCCAGGCCCCTGGGCAAAATGGCACCAGCTTC 2400 

L ccagggaagagtaacgtggcatctctgcgccaggcccctgggcagaacggaaccagcttc 3 900 

1 CATGGCTGTATCCGGAACCTTTACATTAACAGTGAGCTGCAGGACTTCCGGAAAATGCCT 2460 

L cacggctgcatccggaacctttacatcaacagtgagctgcaggacttccagaaggtgccg 3960 

: 2520 

1 atgcaaacaggcattttgcctggctgtgagccatgccacaagaaggtgtgtgcccatggc 4020 

1 ATGTGCCAGCCCAGCAGCCAATCAGGCTICACCTGTGAATGTGAGGAAGGGTGGATGGGG 2580 

1 acatgccagcccagcagccaggcaggcttcacctgcgagtgccagga'aggatggatgggg 4 080 

1 CCCCTCTGIGACCAGAGAACCAAIGAICCCTGCCTCGGAAACAAATGTGTGCATGGGACC 2640 

L cccctctgtgaccaacggaccaatgacccttgccttggaaataaatgcgtacatggcacc 4 140 

1 TGCCTGCCCATCAATGCCTTCTCCTATAGTTGCAAGTGCCTGGAGGGCCATGGCGGTGTC 2700 

ii ii 1 1 1 1 1 1 immmm mm 

[ tgcttgcccatcaatgcgttctcctacagctgtaagtgcttggagggccatggaggtgtc 4200 

1 CTCTGTGATGAAGAAGAAGATCTCTTTAACCCCTGCCAGATGATCAAGTGCAAGCATGGG 2760 

1 ctctgtgatgaagaggaggatctgtttaacccatgccaggcgatcaagtgcaagcatggg 4260 

1 AAGTGCAGGCTTTCTGGAGTGGGCCAGCCCTATTGTGAATGCAACAGTGGATTCACCGGG 2820 

1 aagtgcaggctttcaggtctggggcagccctactgtgaatgcagcagtggatacacgggg 4320 
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Db 2821 GACAGCTGTGATAGAGAAATTTCTTGTCGAGGGGAACGGATAAGGGACTATTACCAGAAG 2880 

IIMMIIIIII lllllll lllllllllllllll MINI I! MINIM III 
Qy 4321 gacagctgtgatcgagaaatctcttgtcgaggggaaaggataagagattattaccaaaag 4380 

Db 2881 CAGCAGGGTTACGCTGCCTGTCAAACAACTAAGAAAGTATCTCGCTTGGAATGCAGAGGC 2940 

IIMM || MM || IIIIIMI III || || || || || llllll 
Qy 4381 cagcagggctatgctgcttgccaaacaaccaagaaggtgtcccgattagagtgcagaggt 4440 

Db 2941 GGGTGCGCTGGAGGCCAGTGCTGTGGACCTCTGAGAAGCAAGAGGCGGAAATACTCTTTC 3000 

inn ii inn iiiiiiiiiiiiii inn iiiiii iiiiiiiiiiiiinii 

Qy 4441 gggtgtgcaggagggcagtgctgtggaccgctgaggagcaagcggcggaaatactctttc 4500 

Db 3001 GAATGCACAGATGGCTCCTCATTTGTGGACGAGGTTGAGAAAGTGGTGAAGTGCGGCTGC 3060 

MIMI II IMIIIII Mill II 1 11 1 M I II I M I II M 

Qy 4501 gaatgcactgacggctcctcctttgtggacgaggttgagaaagtggtgaagtgcggctgt 4560 

Db 3061 GCGAGATGTGCCTCCTAAGCGCGTCTCTAGAAGCTTCTAGCTTCGGCGAAGGTTGTACAC 3120 

MM UN llllll II II Mil II III II lllllllll || 
Qy 4561 acgaggtgtgtgtcctaaacacactcccggcagct-ctgtctttggaaaaggttgtatac 4619 

3121 TTCTTGACCATGTTGGACTAATTCATGCTTCATAATGGAAATATTTGAAATATATTGTAA 3180 

MMIIIMM IIMM HIIIIIIII IIMIIIIIIIIIIIIIIIIIIIM 
4620 ttcttgaccatgtgggactaatgaatgcttcatagtggaaatatttgaaatatattgtaa 4679 

Db 3181 AATACAGAACAGACTTATTTTTATIATGATAATAAAGACTTGT - - -CTGCATTTGGAAAA 3237 

IIIIIIIMIIIIIIIMIIIIIIIIIII lllllllllll I IIIIIIIIIIIIII 
Qy 4680 aatacagaacagacttatttttattatgagaataaagactttttttctgcatttggaaaa 4739 

Db 3238 AAATAATAATAAAA 3251 

III II II INI 
Qy 4740 aaaaaaaaaaaaaa 4753 



# 



RESULT 4 

LOCUS AB011531 5210 bp mRNA ROD 22-AUG-1998 

DEFINITION Rattus norvegicus mRNA for MEGF5, complete cds. 

ACCESSION AB011531 

NID g3449291 

VERSION AB011531.1 GI: 3449291 

KEYWORDS MEGF5. 

SOURCE Rattus norvegicus (strain :Sprague-Dawley) adult brain cDNA to mRNA, 
clone_lib;pSPORT 1 clone;RG2635. 
ORGANISM Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Rodentia; Sciurognathi; Muridae; Murinae; Rattus, 
REFERENCE 1 (bases 1 to 5210) 
AUTHORS Nakayama , M . , Naka j ima , D . and Ohara , O . 
TITLE Direct Submission 

JOURNAL Submitted ( 26 -FEB-1998 ) to the DDBJ/EMBL/GenBank databases. Manabu 
Nakayama, Kazusa DNA Research Institute, Laboratory of DNA 
technology; 1532-3, Yana, Kisarazu, Chiba 292-0812, Japan 
(E-mail mmanabuSkazusa. or.jp, Tel:+81-438-52-3915, 
Fax:+81-438-52-3914) 
2 (sites) 

AUTHORS Nakayama, M., Nakajima,D., Nagase,T., Nomura, N., Seki,N. and 
Ohara, 0. 

TITLE identification of high-molecular-weight proteins with multiple 

EGF-like motifs by motif-trap screening 
JOURNAL Genomics 51 (1), 27-34 (1998) 
MEDLINE 98360089 
FEATURES Location/Qualifiers 
source 1. .5210 

/organism= "Rattus norvegicus" 
/strain-"Sprague-Dawley" 
/db_xref="taxon:10116" 
/clone-"RG2635" 
/clone_lib»"pSPORT 1" 
/dev_stage- "adult" 
/tissue type- "brain" 
gene 292. .4863 

/gene-"MEGF5" 
CDS 292. .4863 

/gene-"MEGF5" 



/note="rat homologue of Drosophila slit protein; rat 
slit2" 

/codon_start-l 

/product- "MEGF5" 

/protein_id-"BAA32461.1" 

/db_xref-"PlD;dl033424" 

/db_xref-"PID:g3449292" 

/db_xref-"GI:3449292" 

/translation-"MAPGRTGAGAAVRARLALALALASILSGPPAAACPTKCTCSAAS 
VDCHGLGLRAVPRGIPRNAERLDLDRNNITRITKMDFTGLKNLRVLHLEDNQVSVIER 
GAFQDLKQLERLRLNKNKLQVLPELLFQSTPKLTRLDLSENQIQGIPRKAFRGVTGVK 
NLQLDNNHISCIEDGAFRALRDLEILTLNNNNISRILVTSFNHMPKIRTLRLHSNHLY 
CDCHLAWLSDWLRQRRTIGQFTLCMAPVHLRGFSVADVQKKEYVCPGPHSEAPACNAN 
SLSCPSACSCSNNIVDCRGKGLTEI PANLPEGIVEI RLEQNS I KS I PAGAF IQY KKLK 
RIDISKNQISDIAPDAFQGLKSLTSLVLYGNKITEIPKGLFDGLVSLQLLLLNANKIN 
CLRVNTFQDLQNLNLLSLYDNKLQTISKGLFAPLQSIQTLHLAQNPFVCDCHLKWLAD 
YLQDNPIETSGARCSSPRRLANKRISQIKSKKFRCSGSEDYRNRFSSECFMDLVCPEK 
CRCEGTIVDCSNQKLSRIPSHLPEYTTDLRLNDNDIAVLEATGIFKKLPNLRKINLSN 
NRIKEVREGAFDGAAGVQELMLTGNQLETMHGRMFRGLSGLKTLMLRSNLISCVNNDT 
FAGLSSVRLLSLYDNRITTISPGAFTTLVSLSTINLLSNPFNCNCHMAWLGRWLRKRR 
IVSGNPRCQKPFFLKEIPIQDVAIQDFTCEGNEENSCQLSPRCPEQCTCVETWRCSN 
RGLHTLPKGMPKDVTELYLEGNHLTAVPKELSTFRQLTLIDLSNNSISMLTNHTFSNM 
SHLSTLILSYNRLRCIPVHAFNGLRSLRVLTLHGNDISSVPEGSFNDLTSLSHLALGI 
NPLHCDCSLRWLSEWIKAGYKEPGIARCSSPESMADRLLLTTPTHRFQCKGPVDINIV 
AKCNACLSSPCRNNGTCSQDPVEQYRCTCPYSYKGKDCTVPINTCVQNPCQHGGTCHL 
SESHRDGFSCSCPLGFEGQRCEINPDDCEDNDCENSATCVDGINNYACVCPPNYTGEL 
CDEVIDYCVPEMNLCQHEAKC I SLDKGFRCECVPGYSGKLCET DNDDCVAHKCRHGAQ 
CVDAVNGYTCICPQGFSGLFCEHPPPMVLLQTSPCDQYECQNGAQCIWQQEPTCRCP 
PGFAGPRCEKLITVNFVGKDSYVELASAKVRPQANISLQVATDKDNGILLYKGDNDPL 
ALELYQGHVRLVYDSLSSPPTTVYSVETVNDGQFHSVELVMLNQTLNLWDKGAPKSL 
GKLQKQPAVGINSPLYLGGIPTSTGLSALRQGADRPLGGFHGCIHEVRINNELQDFKA 
LPPQSLGVSPGCKSCTVCRHGLCRSVERDSWCECHPGWTGPLCDQEAQDPCLGHSCS 
HGTCVATGNSYVCKCAEGYEGPLCDQRNDSANACSAFKCHHGQCHISDRGEPYCLCQP 
GFSGNHCEQENPCLGEIVREAIRRQKDYASCATASKVPIMVCRGGCGSQCCQPIRSKR 
RKYVFQCTDGSSFVEEVERHLECGCRECS " 

polyA_site 5210 

/note- "17 a nucleotides" 
BASE COUNT 1210 a 1583 c 1386 g 1031 t 
ORIGIN 

Query Match 26.3%; Score 1249; DB 32; Length 5210; 

Best Local Similarity 65.1%; Pred. No. 0.00e+00; 

Matches 2918; Conservative 0; Mismatches 1543; indels 21; Gaps 17; 
Db 388 GCCTGCCCCACCAAGTGTACCTGCTCCGCCGCTAGCGTGGACTGCCACGGGCTGGGCCTG 447 

II Mill I MM I Mill I I I IMIIIII MINIM III 

Qy 79 gcgtgcccggcgcagtgctcttgctcgggcagcacagtggactgtcacgggctggcgctg 138 

Db 448 CGCGCCGTTCCCCGGGGCATCCCCCGCAACGCTGAGCGCCTTGACCTGGACAGAAATAAC 507 

Ml III III II 1 1 1 11 II 1 1 r I ( I Ml I II II I I IMIIIII 
Qy 139 cgcagcgtgcccaggaatatcccccgcaacaccgagagactggatttaaatggaaataac 198 

Db 508 ATCACCAGGATCACCAAAATGGACTTCACCGGCTTGAAGAATCTCCGAGTCTTGCATCTG 567 

Mill M II M III II II III II MM II II 

Qy 199 atcacaagaattacgaagacagattttgctggtcttagacatctaagagttcttcagctt 258 

Db 568 GAAGACAATCAGGTCAGCGTCATCGAGAGAGGCGCCTTCCAGGATCTGAAGCAGCTGGAG 627 

II Ml II I III III II Mill II IMIMMIM II I II III 
Qy 259 atggagaataagattagcaccattgaaagaggagcattccaggatcttaaagaactagag 318 

Db 628 CGATTACGTCTGAACAAGAACAAGCTCCAGGTCCTTCCAGAATTACTTTTCCAGAGCACA 687 

II I Ml I Mil M I II III I Mil II II II II I I II 

Qy 319 agactgcgtttaaacagaaatcaccttcagctgtttcctgagttgctgtttcttgggact 378 
Db 688 CCGAAGCTCACCAGACTAGATCTGAGCGAAAACCAGATCCAGGGCATCCCGAGGAAGGCG 747 

lllllll III II Mill II IMIIIII II Mil Mill Mill II 
Qy 379 gcgaagctatacaggcttgatctcagtgaaaaccaaattcaggcaatcccaaggaaagct 438 

Db 748 TTCAGGGGCGTCACGGGCGTGAAGAACCTGCAACTGGACAATAACCACATCAGCTGCATT 807 

III I II I I I I II II HIIIIIIII I Mill IIIIIMI III 

Qy 439 ttccgtggggcagttgacataaaaaatttgcaactggattacaaccagatcagctgtatt 498 

Db 808 GAAGATGGAGCCTTCCGAGCGCTGCGCGATTTGGAGATCCTCACCCTTAACAACAACAAC 867 
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iiiinii ii ill i ii ii ii ii mi i inn n iiiii nun 

Qy 499 gaagatggggcattcagggctctccgggacctggaagtgctcactctcaacaataacaac 558 

Db 868 ATCAGCCGTATCCTGGTCACCAGCTTCAACCACATGCCGAAGATCCGGACCCTGCGTCTC 927 

IN I I II I II IIIIHII Mill II I Mil | || || 
Qy 559 attactagactttctgtggcaagtttcaaccatatgcctaaacttaggacttttcgactg 618 

Db 928 CACTCCAACCACCTGTACTGTGATTGTCACTTGGCCTGGCTCTCAGACTGGCTGCGACAG 987 

ii ii in mini iiiii ii iii iiiinniini i ii i 

Qy 619 cattcaaacaacctgtattgtgactgccacctggcctggctctccgactggcttcgcaaa 678 

Db 988 CGGCGGACCATTGGCCAGTTCACCCTCTGCATGGCGCCCGTGCACCTGAGAGGCTTCAGT 1047 

in mi i ii in i ii mi in minium i i 

Qy 679 aggcctcgggttggtctgtacactcagtgtatgggcccctcccacctgagaggccataat 738 

Db 1048 GTAGCAGACGTGCAGAAGA- AGGA- - - GTATGT -GTGTCCAGGTCCCCACTCAG - AGGC ■ 1100 

•IIIII II II II II II I II II III III III II I 

739 gtagccgaggttcaaaaacgagaatttgtctgcagtgatgaggaagaaggtcaccagtca 798 

Db 1101 TCCA-GCCTGCAATGCCAACTCCCTCTC-CTGCCCTTCTGCCTGCTCATGCAGTAATAAC 1158 

I I I II I II II lllllll I IIIII I II II II II 

Qy 799 tttatggctccttcttgtagtgttttgcactgccctgccgcctgtacctgtagcaacaat 858 

Db 1159 ATTGTAGACTGCCGTGGGAAGGGACTGACTGAGATCCCTGCCAACCTGCCGGAGGGCATC 1218 

ii iiiinii minim ii ii iiimnin i n ii ii iii mi 

Qy 859 atcgtagactgtcgtgggaaaggtctcactgagatccccacaaatcttccagagaccatc 918 
Db 1219 GTGGAAATACGCCTAGAACAGAACTCCATCAAATCCATTCCTGCAGGAGCTTTCATCCAG 1278 

mmii i imiiiii i nun in m i iiniiin i 

Qy 919 acagaaatacgtttggaacagaacacaatcaaagtcatccctcctggagctttctcacca 978 
Db 1279 TACAAAAAACTGAAGCGAATAGACATCAGCAAGAATCAGATATCGGACATTGCTCCAGAT 1338 

n iiiii ii i inn in i inn imiiiii ii ii mi inin 

Qy 979 tataaaaagcttagacgaattgacctgagcaataatcagatctctgaacttgcaccagat 1038 
Db 1339 GCCTTCCAGGGCCTGAAATCACTCACGTCGCTGGTGCTCTATGGGAACAAGATCACAGAG 1398 

ii iiiii ii ii ii ii i ii ii ii i ii ii iinnn 

Qy 1039 gctttccaaggactacgctctctgaattcacttgtcctctatggaaataaaatcacagaa 1098 
Db 1399 ATTCCCAAAGGACTGTTTGACGGGCTGGTGTCCCTGCAGCTGCTCCTCCTCAATGCCAAC 1458 

i mm i i inn i m i in i inn n i i imiiiii 

Qy 1099 ctccccaaaagtttatttgaaggactgttttccttacagctcctattattgaatgccaac 1158 
>9 AAGATCAACTGCCTGCGGGTAAACACCTTCCAGGACCTACAGMCCTCAATCIGCTCTCT 1518 

inn I'M 1 mm i i n nm n n in i n n inn 

59 aagataaactgccttcgggtagatgcttttcaggatctccacaacttgaaccttctctcc 1218 
L9 CTGTATGACAACAAGTTGCAGACTATCAGCAAAGGGCTCTTTGCTCCGCTGCAGTCCATC 1578 

ii i m 1 1 1 ( 1 1 1 1 1 i iiiii iii iii iii nn i ii ii 1 1 mi 

[9 ctatatgacaacaagcttcagaccatcgccaaggggaccttttcacctcttcgggccatt 1278 
'9 CAGACCCTCCACTTAGCTCAAAACCCGTTTGTTTGCGACTGCCACITGAAGTGGTTGGCC 1638 

n ii i n n ii ii nm in mi inn i mm i n 

r 9 caaactatgcatttggcccagaacccctttatttgtgactgccatctcaagtggctagcg 1338 
Db 1639 GACTACCTCCAGGACAACCCCATTGAGACGAGCGGGGCCCGCTGCAGCAGCCCACGCCGG 1698 

n n nm mm mum n n nm mi nn nm 

Qy 1339 gattatctccataccaacccgattgagaccagtggtgcccgttgcaccagcccccgccgc 1398 
Db 1699 CTGGCCAACAAGCGCATCAGCCAGATCAAAAGCAAGAAGTTCCGCIGCTCAGGCTCGGAG 1758 

mn iiiii i n i mm! nm ii inn i n 

Qy 1399 ctggcaaacaaaagaattggacagatcaaaagcaagaaattccgttgttcaggtacagaa 1458 

Db 1759 GATTATCGCAACAGATTCAGCAGCGAGTGCTTCATGGACCTAGTGTGCCCCGAGAAGTGC 1818 

IIIMIII I III II I II INN III II I Mill II IIIII 
Qy 1459 gattatcgatcaaaattaagtggagactgctttgcggatctggcttgccctgaaaagtgt 1518 

Db 1819 CGTTGTGAGGGCACCATTGTGGACTGCTCCAACCAGAAGCTCTCCCGCATCCCGAGCCAC 1878 

ii iiiii ii nn ii n mm n ii inin i inin in 

Qy 1519 cgctgtgaaggaaccacagtagattgctctaatcaaaagctcaacaaaatcccggagcac 1578 

Db 1879 CTCCCTGAATATACCACTGACCTGCGACTGAACGACAATGACATCGCTGTGCTGGAGGCC 1938 
III Mill I II III! II II I IIIII I I III INI III 



Db 141 

^ 151 

Qy 12] 

Db is; 

Qy 12; 



Qy' 1579 attccccagtacactgcagagttgcgtctcaataataatgaatttaccgtgttggaagcc 1638 



19 ACTGGGATCTTCMGMGTTGCCCAACCTGAGGAAMTAMCTTGAGCAATAATAGGATC 1998 

ii ii mn iiiii i ii i i i minimi nm nn mi 

19 acaggaatctttaagaaacttcctcaattacgtaaaataaactttagcaacaataagatc 1698 

AAGGAGGTGCGGGAGGGTGCGTTTGATGGAGCGGCAGGCGTGCAGGAACTGATGCIGACG 2058 
i ii i mm II IIIII IIIII IIIII I III I I II III 
19 acagatattgaggagggagcatttgaaggagcatctggtgtaaatgaaatacttcttacg 1758 

GGGAACCAGCTGGAGACCATGCACGGACGCAIGTTCCGGGGCCTCAGCGGCCTGAAGACA 2118 

i ii i mi i mi mm in i nn n n 

agtaatcgtttggaaaatgtgcagcataagatgttcaagggattggaaagcctcaaaact 1818 
CTGATGTTAAGGAGCAACCTGATCAGCTGCGTAAACAATGACACCTTCGCTGGCCTAAGC 2178 

iiiiiii ii nm i n i m ii mini in n n n 

.9 ttgatgttgagaagcaatcgaataacctgtgtggggaatgacagtttcataggactcagt 1878 
TCTGTGAGACTGCTGTCCCTCTATGACAATCGGATCACCACCATCAGCCCTGGAGCCTTC 2238 

mm i nn n i mi iiii ii ii ii i ii ii ii ii 

tctgtgcgtttgctttctttgtatgataatcaaattactacagttgcaccaggggcattt 1938 
19 ACCACGCTCGTCTCCCTGTCCACCATAAACCTCTTGTCTAACCCTTTCAACTGCAACTGT 2298 

n iii ii i n ii minimi i n inn nm nm 

gatactctccattctttatctactctaaacctcttggccaatccttttaactgtaactgc 1998 
19 CACATGGCCTGGCTCGGCCGGTGGCTGAGGAAACGCCGCATCGTAAGCGGGAACCCCAGA 2358 

n mi m i ii imnm n i nn i n n n in 

tacctggcttggttgggagagtggctgagaaagaagagaattgtcacgggaaatcctaga 2058 
19 TGTCAGAAGCCCTTTTTCCTCAAGGAGATTCCTATCCAAGATGTGGCCATCCAGGACTTC 2418 

nm n ii i mn ii ii ii ii iiiii minimi iniinii 

i9 tgtcaaaaaccatacttcctgaaagaaatacccatccaggatgtggccattcaggacttc 2118 

,9 ACCTGTGA- -A-GGCAATGAAGAGAACAGCTGTCAGCTGAGTCCACGCTGCCCCGAGCAG 2475 
II Mill I II IIIII II II II II I II IIIII II I 
acttgtgatgacggaaatgatgacaatagttgctccccactttctcgctgtcctactgaa 2178 

'6 TGTACCTGTGTGGAGACAGTGGTGCGATGCAGCAACAGGGGTCTCCACACCCTCCCCAAG 2535 

nm ii iiii nm n mn mini mi m i i n n 

'9 tgtacttgcttggatacagtcgtccgatgtagcaacaagggtttgaaggtcttgccgaaa 2238 

16 GGCATGCCAAAGGACGTGACTGAACTGTACCTGGAAGGAAATCATTTAACGGCGGTGCCC 2595 
II II IIII II II II II IIII IIIII IIIII II II II III III 
ggtattccaagagatgtcacagagttgtatctggatggaaaccaatttacactggttccc 2298 

16 AAAGAATTGTCCACCTTCCGACAGCTGACACTAATTGACCTGAGCAACAACAGCATCAGT 2655 
II III I IIII II I III I IIIII II III I II llllllll II II 
aaggaactctccaactacaaacatttaacacttatagacttaagtaacaacagaataagc 2358 

16 ATGCTGACCAATCACACCTTCAGCAACATGTCCCACCTCTCCACACTGATCCTGAGCTAC 2715 

i iii i nm i 1 1 1 1 u 1 1 1 1 1 : i nn in in i n n n m 

i9 acgctttctaatcagagcttcagcaacatgacccagctcctcaccttaattcttagttac 2418 

AACCGGCTGAGATGCATCCCGGTCCATGCCTTCAACGGGCTAAGGTCACTCCGAGTGCTA 2775 
HIM llllllll II II I Nil I II III III II III I II 
aaccgtctgagatgtattcctcctcgcacctttgatggattaaagtctcttcgattactt 2478 

ACCCTCCATGGCAATGACAITICCAGIGTTCCTGAAGGCTCCTTCAATGATCTGACGTCC 2835 

i ii iiiii minimi m mum i imimni i i 

tctctacatggaaatgacatttctgttgtgcctgaaggtgctttcaatgatctttctgca 2538 
CTCTCCCACCTGGCCCTGGGAATCAACCCTCTCCACTGTGACTGCAGTCTGCGTTGGTTA 2895 

i n ii ii ii i iii iniimi iiiiiii ii i iii mm 

ttatcacatctagcaattggagccaaccctctttactgtgattgtaacatgcagtggtta 2598 
TCAGAGTGGATAAAGGCTGGGTACAAGGAGCCTGGCATTGCCAGATGCAGTAGCCCCGAG 2955 

ii ii iii i iii i i ii immiiii nm i n i i n i 

tccgactgggtgaagtcggaatataaggagcctggaattgctcgttgtgctggtcctgga 2658 
16 TCCATGGCAGACAGACTCCTGCTCACCACCCCCACCCACAGGTTCCAGTGCAAAGGGCCA 3015 

iniim i in i mn ii m n 1 1 n n nn n 

i9 gaaatggcagataaacttttactcacaactccctccaaaaaatttacctgtcaaggtcct 2718 
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Db 301 

Qy 27: 



.6 GTGGACATTAACATCGTGGCCAAGTGCAACGCCTGCCTCTCTAGCCCATGCAAGAACAAC 3075 

mil i ii ii i ii mil ill Mini; n i : n n i i 

gtggatgtcaatattctagctaagtgtaacccctgcctatcaaatccgtgtaaaaatgat 2778 

GGCACTTGCAGCCAGGATCCGGTGGAGCAGTACCGCIGTACCTGCCC1TACAGCTACAAG 3135 
HIM M I Mill II M Mill II Mill II II II llll 

9 ggcacatgtaatagtgatccagttgacttttaccgatgcacctgtccatatggtttcaag 2838 

GGCAAGGACIGCACCGTGCCCATCAACACCTGCGTCCAGAACCCCTGTCAGCACGGAGGC 3195 
II IIIIIM II M II I HIM II Mill III I II Mill 

;9 gggcaggactgtgatgtcccaattcatgcctgcatcagtaacccatgtaaacatggagga 2898 

ACCTGTCACCTCAGTGAGAGCCACAGAGATGGGTTCAGCTGCTCCTGCCCCCTGGGCTTT 3255 
U II III I I II I I MM III IN HI || Ml 

i9 acttgccacttaaaggaaggagaagaagatggattctggtgtatttgtgctgatggattt 2958 

>6 GAGGGACAGCGGTGTGAGATCAACCCAGATGACTGTGAGGATAACGACTGTGAAAACAGC 3315 

ii in i inn inn inn in inn minimi i 

gaaggagaaaattgtgaagtcaacgttgatgattgtgaagataatgactgtgaaaataat 3018 

GCCACCTGTGTGGACGGGATCAACAACTACGCATGCGTGTGCCCGCCGAACTACACAGGG 3375 

i ii mil ii ii ii ii mm inn i nm n i n in 

tctacatgtgtcgatggcattaataactacacatgcctttgcccacctgagtatacaggt 3078 

6 GAGCTGTGTGATGAGGTGATTGACTACTGCGTGCCCGAGATGAACCTCTGTCAGCACGAG 3435 

in mini in i i iiii in i i ii mm m imiin 

gagttgtgtgaggagaagctggacttctgtgcccaggacctgaacccctgccagcacgat 3138 

■ 6 GCC AAGTGC ATCTCCCTGGACAAAGGATTCAGGTGCGAATGTGTCCCTGGCT ACAGTGGA 3495 

i Minn ii niiin n n n n n in n 

tcaaagtgcatcctaactccaaagggattcaaatgtgactgcacaccagggtacgtaggt 3198 

'6 AAGCTGTGCGAGACAGACAACGATGACTGTGTGGCTCACAAGTGTCGCCACGGAGCCCAG 3555 

I 1 nm 1 11 11 nm 1 11:111 1 : 1 1 1 1 1 1 [ 1 

gaacactgcgacatcgattttgacgactgccaagacaacaagtgtaaaaacggagcccac 3258 

^6 TGTGTGGACGCGGTCAATGGCTACACGTGCATCTGCCCCCAGGGCTTCAGCGGGCICTTC 3615 

II 11 11 11 11 inn nun nnii i n i in n i in 

tgcacagatgcagtgaacggctatacgtgcatatgccccgaaggttacagtggcttgttc 3318 

,6 TGTGAGCACCCCCCACCCATGGTTCTGCTACAGACCAGCCCCTGTGACCAGTACGAGTGC 367 5 

mm i iiiiiiinn n i i nnmiiimi 1 1 iiii 

.9 tgtgagttttctccacccatggtcctccctcgtaccagcccctgtgataattttgattgt 3378 

6 CAGAATGGGGCGCAGTGCATCGTGGTACAGCAGGAGCCCACTTGCCGCTGTCCCCCAGGC 3735 

imiiii ii nm nm i i nm i n i in n in 

'9 cagaatggagctcagtgtatcgtcagaataaatgagccaatatgtcagtgtttgcctggc 3438 

TTCGCTGGGCCCAGGTGTGAGAAGCTCATCACTGTGAATTTCGTGGGCAAGGACTCCTAT 3795 

i ii i mm ii i i i ininni i iii ii ii ni 

tatcagggagaaaagtgtgaaaaattggttagtgtgaattttataaacaaagagtcttat 3498 

'6 GTGGAACTGGCCTCTGCCAAGGTCCGGCCCCAGGCCAACATCTCCCTGCAGGTGGCCACT 3855 

i i i i ii nnii nm in i nm i n in i nm 

cttcagattccttcagccaaggttcggcctcagacgaacataacacttcagattgccaca 3558 

GACAAGGACAACGGCATCCTTCTTTACAAGGGAGACAACGACCCCCTGGCACTGGAGCTG 3915 

ii i mi iii inn n n mm: iiiii iiii in in n 

gatgaagacagcggaatcctcctgtataagggtgacaaagaccatatcgcggtagaactc 3618 

TACCAGGGTCACGTGAGGCTGGTGTATGACAGCCTGAGCTCCCCTCCGACCACGGTGTAC 3975 
II I HI I II I MUM | II I I I I III 

tatcgggggcgtgttcgtgccagctatgacaccggctctcatccagcttctgccatttac 3678 

' 6 AGTGTGG AGACGGTGAATG ATGGCC AGHTCACAGTGTGG AGCTGGTGATGCTAAACCAG 4035 

1 1 1 1 1 1 m 1 1 1 i nun i ii mi inn n i i i in 

agtgtggagacaatcaatgatggaaacttccacattgtggaactacttgccttggatcag 3738 

6 ACCCTGAACCTGGTGGTAGACAAAGGAGCCCCCAAGAGCCTGGGGAAGCTCCAGAAGCAG 4095 

i ii ii n ii ii mm iii ni mm 

9 agtctctctttgtccgtggatggtgggaaccccaaaatcatcactaacttgtcaaagcag 3798 



Db 4096 CCAGCAGTGGGCATCAACAGTCCCCTCTATCTTGGAGGCATCCCCACGTCTACAGGCCTC 4155 

I I II I II III llllll I MINIM II I I II 

Qy 3799 tccactctgaattttgactctccactctatgtaggaggcatgccagggaagagtaacgtg 3858 

Db 4156 TCAGCCTTACGCCAGGGTGCAGACAGGCCGCTGGGCGGCTTCCACGGCTGTATACACGAA 4215 

n i i mini I I I I -I j 1 1 1 1 1 1 1 1 1 1 1 III I 
Qy 3859 gcatctctgcgccaggcccctgggcagaacggaaccagcttccacggctgcatccggaac 3918 

Db 4216 GTGCGCATCAACAACGAGTTGCAGGATTTCAAAGCCCTCCCACCCCAGTCCCTGGGGGTC 4275 

i imiiii iii mini in i i ii n i i 

Qy 3919 ctttacatcaacagtgagctgcaggacttccagaaggtgccgatgcaaacaggcattttg 3978 

Db 4276 TCTCCCGGCT -GCAAATCCTGCACT • -GTGTGTCGTCACGGCCTGTGTCGTTCCGTGGAG 4332 

II I I II Ml I llllll II III || | || 

Qy 3979 cctggctgtgagccatgccacaagaaggtgtgtgcccatggcacatgccagcccagcagc 4038 

Db 4333 AAGGACAGCGTAGTGTGTGAGTGCCATCCGGGATGGACTGGTCCACTGTGTGATCAGGAA 4392 

in ii i ii iiiini Mini n ii n nm n 

Qy 4039 caggcaggcttcacctgcgagtgccaggaaggatggatggggcccctctgtgaccaacgg 4098 
Db 4393 GCCCAGGACCCCTGCCTTGGTCACAGCTGCAGCCATGGGACATGCGTGGCGA-CTG-GCA 4450 

ii i nm mum i i iii iiiii ii in ii i i i ii 

Qy 4099 accaatgacccttgccttggaaataaatgcgtacatggcacctgcttgcccatcaatgcg 4158 

Db 4451 A-CTCATATGTGTGCAAGTGTGCCGAGGGCTATGAGGGACCTTTGTGTGACCAGAAGAAT 4509 
III II II IIIII llllll III II I IIIII I II I 
4159 ttctcctacagctgtaagtgcttggagggccatggaggtgtcctctgtgatgaagaggag 4218 



Qy 

Db 4510 GACTCTGCCAATGCCTGCTCAGCCTTCAAGTGCCACCACGGGCAGTGCCACATCTCAGAT 4569 

II II I III II IMIIII I II III IIIII I IIII I 

Qy 4219 gatctgtttaacccatgccaggcgatcaagtgcaagcatgggaagtgcaggctttcaggt 4278 

Db 4570 CGAGGGGAGCCCIATTGCCTGTGCCAGCCTGGCTTCAGTGGCAATCACTGTGAGCAAGAG 4629 

i in iimn n iii iii i ii ii i mm i in 

Qy 4279 ctggggcagccctactgtgaatgcagcagtggatacacgggggacagctgtgatcgagaa 4338 

Db 4630 AATCCATGTTTGGGAGAGATAGTCCGTGAAGCCATCCGCCGCCAGAAAGATTATGCCTCC 4689 

I I HI IMII II II II III I I IIIII I 
Qy 4339 atctcttgtcgaggggaaaggataagagattattaccaaaagcagcagggctatgctgct 4398 

Db 4690 TGTGCCACAGCATCCAAGGTGCCCATCATGGTATGCCGCGG-GGGC-TGCGGGAGC-CAA 4746 

ii iii i nnn ii ii iii i ii in iii mi ii 

Qy 4399 tgccaaacaaccaagaaggtgtcccgattagagtgcagaggtgggtgtgcaggagggcag 4458 
Db 4747 TGCTGCCAGCCGATTCGGAGCAAGCGGCGGAAATACGTCTTCCAGTGCACTGACGGCTCC 4806 

nm iii i iiminmiiiiinii in i niiiiiininii 

Qy 4459 tgctgtggaccgctgaggagcaagcggcggaaatactctttcgaatgcactgacggctcc 4518 
Db 4807 TCGTTCGTGGAAGAGGTGGAGAGACACTTGGAATGTGGCTGT 4848 

ii ii nm iiiii iiii i ii i ii mm 

Qy 4519 tcctttgtggacgaggttgagaaagtggtgaagtgcggctgt 4560 
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RESULT 5 

LOCUS AB017167 5094 bp itiRNA PRI 06 -F 

DEFINITION Homo sapiens mRNA for Slit-1 protein, complete cds. 
ACCESSION AB017167 
NID g4049584 

AB017167.1 GI:4049584 
slit-1; Slit-1 protein, 
SOURCE Homo sapiens fetal tissue_lib:brain cDNA to mRNA. 
ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 5094) 
Itoh,A. and Sakano,S, 
Direct Submission 

Submitted (27-AUG-1998) to the DDBJ/EMBL/GenBank databases. Akira 
Itoh, Asahi chemical Industry co.,ltd., Life Science Fundamental 
Research Laboratory; 2-1, Samejima, Fuji, Shizuoka 416-8501, Japan 
(E-mail:a8611483Gut, asahi-kasei.co.jp, Tel:+81-545-62-3231, 
Fax:+81-545-62-3249) 
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AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
FEATURES 
source 



BASE COUNT 
ORIGIN 



Itoh,A., Miyabayashi,T., Ohno,M. and Sakano,S. 

Cloning and expressions of three mammalian homologues of Drosophila 

slit suggest possible roles for Slit in the formation and 

maintenance of the nervous system 

Brain Res. Mol. Brain Res, 62 (2), 175-186 (1998) 

99033071 

Location/Qualifiers 
1. .5094 

/organise "Homo sapiens" 
/db_xref =■ " taxon : 960 6 " 
/dev_stage=" fetal" 
/tissue_lib-"brain" 
233. .4837 
/gene-"slit-l n 
233. .4837 
/gene="slit-l" 

/note- "human homologue of Drosophila slit gene" 

/codon_start=l 

/product- "SI it -1 protein" 

/protein id= n BAA35184.1" 

/db_xref-"PID:dl036170" 

/db_xref»"PID:g4049585" 

/db_xref= n GI:4049585 n 

/translation°"MALTPGWGSSAGPVRPELWLLLWAAAWRLGASACPALCTCTGTT 
VDCHGTGLQAIPKNIPRNTERLELNGNNITRIHKNDFAGLKQLRVLQLMENQIGAVER 
GAFDDMKELERLRLNRNQLHMLPELLFQNNQALSRLDLSENAIQAIPRKAFRGATDLK 
NLRLDKNQISCIEEGAFRALRGLEVLTLNNNNITTIPVSSFNHMPKLRTFRLHSNHLF 
CDCHLAWLSQWLRQRPTIGLFTQCSGPASLRGLNVAEVQKSEFSCSGQGEAGRVPTCT 
LSSGSCPAMCTCSNGIVDCRGKGLTAIPANLPETMTEIRLELNGIKSIPPGAFSPYRK 
LRRIDLSNNQIAEIAPDAFQGLRSLNSLVLYGNKITDLPRGVFGGLYTLQLLLLNANK 
INCIRPDAFQDLQNLSLLSLYDNKIQSLAKGTFTSLRAIQTLHLAQNPFICDCNLKWL 
ADFLRTNPIETSGARCASPRRLANKRIGQIKSKKFRCSAKEQYFIPGTEDYQLNSECN 
SDWCPHKCRCEANWECSSLKLTKIPERIPQSTAELRLNNNEISILEATGMFKKLTH 
LKKINLSNNKVSEIEDGAFEGAASVSELHLTANQLESIRSGMFRGLDGLRTLMLRNNR 
ISCIHNDSFTGLRNVRLLSLYDNQITTVSPGAFDTLQSLSTLNLLANPFNCNCQLAWL 
GGWLRKRKIVTGNPRCQNPDFLRQIPLQDVAFPDFRCEEGQEEGGCLPRPOCPQECAC 
LDTWRCSNKHLRALPKGIPKNVTELYLDGNQFTLVPGQLSTFKYLQLVDLSNNKISS 
LSNSSFTNMSQLTTLILSYNALQCIPPLAFQGLRSLRLLSLHGNDISTLQEGIFADVT 
SLSHLAIGANPLYCDCHLRWLSSWVKTGYKEPGIARCAGPQDMEGKLLLTTPAKKFEC 
QGPPTLAVQAKCDLCLSSPCQNQGTCHNDPLEVYRCACPSGYKGRDCEVSLNSCSSGP 
CENGGTCHAQEGEDAPFTCSCPTGFEGPTCGVNTDDCVDHACANGGVCVDGVGNYTCQ 
CPLQYEGKACEQLVDLCSPDLNPCQHEAQCVGTPDGPRCECMPGYAGDNCSENQDDCR 
DHRCONGAQCMDEVNSYSCLCAEGYSGQLCEIPPHLPAPKSPCEGTECQNGANCVDQG 
NRPVCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWPRANITLQVSTAEDNGILL 
YNGDNDHIAVELYQGHVRVSYDPGSYPSSAIYSAETINDGQFHTVELVAFDQMVNLSI 
DGGSPMTMDNFGKHYTLNSEAPLYVGGMPVDVNSAAFRLWQILNGTGFHGCIRNLYIN 
NELQDFTKTQMKPGWPGCEPCRKLYCLHGICQPNATPGPMCHCEAGWVGLHCDQPAD 
GPCHGHKCVHGQCVPLDALSYSCQCQDGYSGALCNQAGALAEPCRGLQCLHGHCQASG 
TKGAHCVCDPGFSGELCEQESECRGDPVRDFHQVQRGYAICQTTRPLSWVECRGSCPG 
QGCCQGLRLKRRKFTFECSDGTSFAEEVEKPTKCGCALCA" 
1037 a 1616 c 1508 g 933 t 



Query Match 25.9%; 
Best Local Similarity 65.0%; 
Matches 2926; Conservative 



Score 1231; DB 29; Length 5094; 
Pred. No. 0.00e+00; 

0; Mismatches 1551; Indels 24; Gaps 17; 



Db 328 GGCGTGCCCCGCCCTCTGCACCTGCACCGGAACCACGGTGGACTGCCACGGCACGGGGCT 387 

IIIIMIII II I III I III I II I III llllllll Mill II III 
Qy 78 ggcgtgcccggcgcagtgctcttgctcgggcagcacagtggactgtcacgggctggcgct 137 

Db 388 GCAGGCCATTCCCAAGAATATACCTCGGAACACCGAGCGCCTGGAACTCAATGGCAACAA 447 

II I I 1 1 1 1 MM II II IIIIMIII I Mill I Mill II II 
Oy 138 gcgcagcgtgcccaggaatatcccccgcaacaccgagagactggatttaaatggaaataa 197 

Db 448 CATCACTCGGATCCATAAGAATGACTTTGCGGGGCTCAAGCAGCTGCGGGTGCTGCAGCT 507 

llllll I II Nil II Mill II II I II II I II II Mill 
Qy 198 catcacaagaattacgaagacagattttgctggtcttagacatctaagagttcttcagct 257 

Db 508 GATGGAGAACCAGATTGGAGCAGTGGAACGTGGTGCTTTTGATGACATGAAGGAGCTGGA 567 

llllllll Mill I I I III I II II II I II I II II II II 
Qy 258 tatggagaataagattagcaccattgaaagaggagcattccaggatcttaaagaactaga 317 



Db 568 GCGGCTGCGACTGAACCGAAACCAGCTGCACATGTTACCGGAACTGCTGTTCCAGAACAA 627 

I I Mill I Ml MM II II II Mil II II lllllll I I 
Qy 318 gagactgcgtttaaacagaaatcaccttcagctgtttcctgagttgctgtttcttgggac 377 

Db 628 CCAGGCTTTGTCAAGACTGGACTTGAGTGAGAACGCCATCCAGGCCATCCCCAGGAAAGC 687 

I I I II IMI I Mill III II Mil lllll llllllll 
Qy 378 tgcgaagctatacaggcttgatctcagtgaaaaccaaattcaggcaatcccaaggaaagc 437 

Db 688 TTTTCGGGGAGCTACGGACCTTAAAAATTTACGGCTGGACAAGAACCAGATCAGCTGCAT 747 
III II II II III I llllllll I lllll I IMIMIMMHI II ' 
Qy 438 tttccgtggggcagttgacataaaaaatttgcaactggattacaaccagatcagctgtat 497 

Db 748 TGAGGAAGGGGCCTTCCGTGCTCTGCGGGGGCTGGAGGTGCTGACCCTGAACAACAACAA 807 

III II lllll III I Mill MM lllll lllll II II Mill lllll 
Qy 498 tgaagatggggcattcagggctctccgggacctggaagtgctcactctcaacaataacaa 557 

Db 808 TATCACCACCATCCCCGTGTCCAGCTTCAACCATATGCCCAAGCTACGGACCTTCCGCCT 867 

II II I I I III I II 1 1 1 1 1 II ! 1 1 1 1 1 1 || || Mil M II II 
Qy 558 cattactagactttctgtggcaagtttcaaccatatgcctaaacttaggacttttcgact 617 

Db 868 GCACTCCAACCACCTGTTTTGCGACTGCCACCTGGCCTGGCTCTCGCAGTGGCTGAGGCA 927 

III II III llllll III IIMIIIIIIMIIIIIMMII I Mill I I 
Qy 618 gcattcaaacaacctgtattgtgactgccacctggcctggctctccgactggcttcgcaa 677 

Db 



928 GCGGCCAACCATCGGGCTCTTCACCCAGTGCTCGGGCCCAGCCAGCCTGCGTGGCCTCAA 987 

Mil llllll III Mil MM II MM I llll II 
678 aaggcctcgggttggtctgtacactcagtgtatgggcccctcccacctgagaggccataa 7 37 



Db . 988 TGTGGCAGAGGTCCAGAAG AGTGAGTTCAGCTGCTCAGGCCAGGG AGAAGCGGGGC • G - C 1045 

III II lllll II II I II II Ml I Ml lllll I I I 
Qy 738 tgtagccgaggttcaaaaacgagaatttgtctgcagtgatgaggaagaaggtcaccagtc 797 

Db 1046 GTGCCCACCTGCACCCTGTCCTCCGGCTC-CTGCCCGGCCATGTGCACCTGCAGCAATGG 1104 

I II I I III I I MM III II lllll MM 

Qy 798 atttatggctccttcttgtagtgttttgcactgccctgccgcctgtacctgtagcaacaa 857 

Db 1105 CATCGTGGACTGTCGTGGAAAAGGCCTCACTGCCATCCCGGCCAACCTGCCCGAGACCAT 1164 

lllll MINIM MM MUM MM I II II II IIMM 
Qy 858 tatcgtagactgtcgtgggaaaggtctcactgagatccccacaaatcttccagagaccat 917 

Db 1165 GACGGAGATACGCCTGGAGCTGAACGGCATCAAGTCCATCCCTCCTGGAGCCTTCTCACC 1224 

II II Mil llll I llll lllll MMIMIMI llllllll 

Qy 918 cacagaaatacgtttggaacagaacacaatcaaagtcatccctcctggagctttctcacc 977 

Db 1225 CTACAGAAAGCTACGGAGGATAGACCTGAGCAACAATCAGATCGCTGAGATTGCACCCGA 1284 

II I MM I I II I III II MM IIIIMIII llll lllllll II 
Qy 978 atataaaaagcttagacgaattgacctgagcaataatcagatctctgaacttgcaccaga 1037 

Db 1285 CGCCTTCCAGGGCCTCCGCTCCCTGAACTCGCTGGTCCTCTATGGAAACAAGATCACAGA 1344 

II lllll II II MM MM II II MMIMIMI II IIMM 
Qy 1038 tgctttccaaggactacgctctctgaattcacttgtcctctatggaaataaaatcacaga 1097 

Db 



1345 CCTCCCCCGTGGTGTGTTTGGAGGCCTATACACCCTACAGCTCCTGCTCCTGAATGCCAA 1404 

MM II I Ml III II I II MMIIIMI I MMIMM 
1098 actccccaaaagtttatttgaaggactgttttccttacagctcctattattgaatgccaa 1157 



Db 1405 CAAGATCAACIGCATCCGGCCCGATGCCTTCCAGGACCTGCAGAACCTCTCACTGCTCTC 1464 

llllll llllll I III Mil II Mil II II III I II IMI 
Qy 1158 caagataaactgccttcgggtagatgcttttcaggatctccacaacttgaaccttctctc 1217 

Db 1465 CCTGTATGACAACAAGATCCAGAGCCTCGCCAAGGGCACTTTCACCTCCCTGCGGGCCAT 1524 

III MINIMI! I llll I MIMMII II II I I II llllllll 
Qy 1218 cctatatgacaacaagcttcagaccatcgccaaggggaccttttcacctcttcgggccat 1277 

Db 1525 CCAGACTCTGCACCTGGCCCAGAACCCTTTCATTTGCGACTGTAACCTCAAGTGGCTGGC 1584 

II III llll llllllllllll! II lllll lllll I IMIIMIM II 
Qy 1278 tcaaactatgcatttggcccagaacccctttatttgtgactgccatctcaagtggctagc 1337 

Db 1585 AGACTTCCTGCGCACCAATCCCATCGAGACGAGTGGTGCCCGCTGTGCCAGTCCCCGGCG 1644 

II I II I MM II II MM IMIIMIM II Ml lllll II 
Qy 1338 ggattatctccataccaacccgattgagaccagtggtgcccgttgcaccagcccccgccg 1397 

Db 1645 CCTCGCCAACAAGCGCATCGGGCAGATCAAGAGCAAGAAGTTCCGGTGCTCAGCCAAAGA 1704 
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Qy 1398 cctggcaaacaaaagaattggacagatcaaaagcaagaaattccgttgttcaggtacaga 1457 

Db 1705 GCAGTACTTCATTCCAGGCACGGAGGAmCCAGCTGAACAGCGAGTGCAACAGCGACGT 1764 

M Mill I I II I III! I || | || Ml | || | 
Qy 1458 --ag-a-tt-at-cgat-caa--a--attaagtggaga-ctgctt-tgc----g-gatct 1499 

Db 1765 GGTCTGTCCCCACAAGTGCCGCTGTGAGGCCAACGTGGTGGAGTGCTCCAGCCTGAAGCT 1824 

II II II I Hill llllllll II I II II Mill I I inn 

Qy 1500 ggcttgccctgaaaagtgtcgctgtgaaggaaccacagtagattgctctaatcaaaagct 1559 

Db 1825 CACCAAGATCCCTGAGCGCATCCCCCAGTCCACGGCAGAACTGCGAITGAATAACAATGA 1884 

ii m inn mi iii inn in inn mi i inn in 

Qy 1560 caacaaaatcccggagcacattccccagtacactgcagagttgcgtctcaataataatga 1619 

Db 1885 GATTTCCATCCTGGAGGCCACTGGGATGTTTAAAAAACTTACACATCTGAAGAAAATCAA 1944 

II II I Mil lllll II II UNI UNI! | || | ,11 || 
Qy 1620 atttaccgtgttggaagccacaggaatctttaagaaacttcctcaattacgtaaaataaa 1679 

Db 1945 TCTGAGCAACAACAAGGTGTCAGAAATTGAAGATGGGGCCTTCGAGGGCGCAGCCTCTGT 2004 

fl M 1 1 1 1 1 1 III I Mil lllll II || || || || || HI | IN 
1680 ctttagcaacaataagatcacagatattgaggagggagcatttgaaggagcatctggtgt 1739 
2005 GAGCGAGCTGCACCTAACTGCCAACCAGCTGGAGTCCATCCGGAGCGGCATGTTCCGGGG 2064 

I II I I II II III INI | | | illlll Ml 

Qy 1740 aaatgaaatacttcttacgagtaatcgtttggaaaatgtgcagcataagatgttcaaggg 1799 

Db 2065 TCTGGATGGCTTGAGGACCCTAATGCTGCGGAACAACCGCATCAGCTGCATCCACAACGA 2124 

Ml! Ill I II I III II I I III II II I III I || || 
Qy 1800 attggaaagcctcaaaactttgatgttgagaagcaatcgaataacctgtgtggggaatga 1859 

Db 2125 CAGCTTCACGGGCCTGCGCAACGTCCGGCTCCTCTCGCTCTACGACAACCAGAICACCAC 2184 

III III! II II I II II I II II I I! II II II II II II 

Qy 1860 cagtttcataggactcagttctgtgcgtttgctttctttgtatgataatcaaattactac 1919 

Db 2185 CGTATCCCCAGGAGCCTTCGACACCCTCCAGTCCCTCTCCACACTGAATCTCCTGGCCAA 2244 

II I UNI II II II II lllll II I II II II II III lllllll 

Qy 1920 agttgcaccaggggcatttgatactctccattctttatctactctaaacctcttggccaa 1979 

Db 2245 CCCTTTCAACTGCAACTGCCAGCTGGCCTGGCTAGGAGGCTGGCTACGGAAGCGCAAGAT 2304 

inn inn linn i inn i i mi in i i i 

Qy 1980 tccttttaactgtaactgctacctggcttggttgggagagtggctgagaaagaagagaat 2039 
Db 2305 CGTGACGGGGAACCCGCGATGCCAGAACCCTGACTTTTTGCGGCAGATTCCCCTGCAGGA 2364 

II UNI II II MM M II II MM II I II Ml I MM! 
Qy 2040 tgtcacgggaaatcctagatgtcaaaaaccatacttcctgaaagaaatacccatccagga 2099 

Db 2365 CGTGGCCTTCCCTGACTTCAGGTGTGAGGAAGGCCAGGAGGAGGGGGGCTGCCTGCCCCG 2424 

• IIIMI I I lllllll Mill II II I II II I III || | 
2100 tgtggccattcaggacttcacttgtgatgacggaaatgatgacaatagttgctccccact 2159 
24 25 CCCACAGTGCCCACAGGAGTGCGCCTGCCTGGACACCGTGGTCCGATGCAGCAACAAGCA 2484 

I I II M II M I Ml III! II II llllllll lllllll 

Qy 2160 ttctcgctgtcctactgaatgtacttgcttggatacagtcgtccgatgtagcaacaaggg 2219 

Db 2485 CCTGCGGGCCCTGCCCAAGGGCATTCCCAAGAATGTCACAGAACTCTATTTGGACGGGAA 2544 

II M I MM II II Mill I MMIIIMI I III MM II II 

Qy 2220 tttgaaggtcttgccgaaaggtattccaagagatgtcacagagttgtatctggatggaaa 2279 

Db 2545 CCAGTTCACGCTGGTTCCGGGACAGCTGTCTACCTTCAAGTACCTGCAGCTCGTGGACCT 2604 

III M II llllllll I II III || Ml || || I Ml I 
Qy 2280 ccaatttacactggttcccaaggaactctccaactacaaacatttaacacttatagactt 2339 

Db 2605 GAGCAACAACAAGATCAGTTCCTTAAGCAATTCCTCCTTCACCAACATGAGCCAGCTGAC 2664 

II lllllll II II I I Ml lllll llllllll MUM 
Qy 2340 aagtaacaacagaataagcacgctttctaatcagagcttcagcaacatgacccagctcct 2399 

Db 2665 CACTCTGATCCTCAGCTACAATGCCCTGCAGTGCATCCCGCCTTTGGCCTTCCAGGGAOT 2724 

Ml I M II II HIM III II M II III MM I Ml I 
Qy 2400 caccttaattcttagttacaaccgtctgagatgtattcctcctcgcacctttgatggatt 2459 

Db 2725 CCGCTCCCTGCGCCTGCTGTCTCTCCACGGCAATGACATCTCCACCCTCCAAGAGGGCAT 2784 

M M M I M Mill II II llllllll II MM II 



Qy 2460 aaagtctcttcgattactttctctacatggaaatgacatttctgttgtgcctgaaggtgc 2519 

Db 2785 CTTTGCAGACGTGACCTCCCTGTCTCACCTGGCCATTGGTGCCAACCCCCTATACTGTGA 2844 

M II I I I III II II II lllll llllllll || IIMIMI 

Qy 2520 tttcaatgatctttctgcattatcacatctagcaattggagccaaccctctttactgtga 2579 

Db 2845 CTGCCACCTCCGCTGGCTGTCCAGCTGGGTGAAGACTGGCTACAAGGAACCGGGCATTGC 2904 

M II I I Ml I Ml llllllll I | || Mill || || IMM 

Qy 2580 ttgtaacatgcagtggttatccgactgggtgaagtcggaatataaggagcctggaattgc 2639 

Db 2905 TCGTTGIGCTGGGCCCCAGGACATGGAGGGCAAGCTGCTCCTCACCACGCCTGCCAAGAA 2964 

M 1 1 1 1 M 1 1 1 1 II II MM I II II I Mill II II MM II 

Qy 2640 tcgttgtgctggtcctggagaaatggcagataaacttttactcacaactccctccaaaaa 2699 

Db 2965 GTTTGAATGCCAAGGTCCTCCAACGCTGGCTGTCCAGGCCAAGTGTGATCTCTGCTTGTC 3024 

Ml II lllllllll I I I I || MMM I I Mil I M 

Qy 2700 atttacctgtcaaggtcctgtggatgtcaatattctagctaagtgtaacccctgcctatc 2759 

Db 3025 CAGTCCGTGCCAGAACCAGGGCACCTGCCACAACGACCCCCTTGAGGTGTACAGGTGCGC 3084 

I MUM I II I lllll II II II II MM I III I III I 

Qy 2760 aaatccgtgtaaaaatgatggcacatgtaatagtgatccagttgacttttaccgatgcac 2819 

Db 3085 CTGCCCCAGCGGCTATAAGGGTCGAGACTGTGAGGTGTCCCTGAACAGCTGTTCCAGTGG 3144 

III M M I lllll I llllllll III I I III MM 

Qy 2820 ctgtccatatggtttcaaggggcaggactgtgatgtcccaattcatgcctgcatcagtaa 2879 

Db 3145 CCCCTGTGAAAATGGGGGCACCTGCCATGCACAGGAGGGCGAGGATGCCCCGTTCACGTG 3204 

Ml III II IMI II II lllll I IMI II II II I III III 

Qy 2880 cccatgtaaacatggaggaacttgccacttaaaggaaggagaagaagatggattctggtg 2939 

Db 3205 CTCCTGTCCCACCGGCTTTGAAGGACCAACCTGTGGGGTGAACACAGATGACTGTGTGGA 3264 

Ml I II lllllllll II MM II III lllll IMI II 

Qy 2940 tatttgtgctgatggatttgaaggagaaaattgtgaagtcaacgttgatgattgtgaaga 2999 

Db 3265 TCATGCCTGTGCCAATGGGGGCGTCTGTGTGGATGGTGTGGGCAACTACACCTGCCAGTG 3324 

I Ml Hill III Mill Mill | llllllll IMI II 

Qy 3000 taatgactgtgaaaataattctacatgtgtcgatggcattaataactacacatgcctttg 3059 

Db 3325 CCCCCTGCAGIATGAGGGAAAGGCCTGTGAGCAGCTGGTGGACTTGTGCTCICCGGATCT 3384 

Ml I lllll II II IIIMI II I lllllll II I I Ml II 

Qy 3060 cccacctgagtatacaggtgagttgtgtgaggagaagctggacttctgtgcccaggacct 3119 

Db . 3385 GAACCCATGTCAACACGAGGCCCAGTGTG1GGGCACCCCGGATGGGCCCAGGTGTGAGTG 3444 

mm ii ii iiiii i mi i mm i n n mm n 

Qy 3120 gaacccctgccagcacgattcaaagtgcatcctaactccaaagggattcaaatgtgactg 3179 

Db 3445 CATGCCAGGTTATGCAGGTGACAACTGCAGTGAGAACCAGGATGACTGCAGGGACCACCG 3504 

II IMM II I IIIMI Mill I II IIIMI Ml II 

Qy 3180 cacaccagggtacgtaggtgaacactgcgacatcgattttgacgactgccaagacaacaa 3239 

Db' 3505 CTGCCAGAATGGGGCCCAGTGTATGGATGAAGTCAACAGCTACTCCTGCCTCTGTGCTGA 3564 

M I M M Mill M I MM Ml III IMI I III I II I II 

Qy 3240 gtgtaaaaacggagcccactgcacagatgcagtgaacggctatacgtgcatatgccccga 3299 

Db 3565 GGGCTACAGTGGACAGCTCTGTGAGATCCCTCC--CCATCTGCCTGCCCCCAAG-AGCCC 3621 

M MMIIII I llllllll I IIM llll Ml II I I Mill 

Qy 3300 aggttacagtggcttgttctgtgagttttctccacccatggtcctccctcgtaccagccc 3359 

Db 3622 CTGTGAGGGGACTGAGTGCCAGAATGGGGCCAACTGTGTGGACCAGGGCAACAGGCCTGT 3681 

HUM III II llllllll II llll II I II Ml | 

Qy 3360 ctgtgataattttgattgtcagaatggagctcagtgtatcgtcagaataaatgagccaat 3419 

Db 3682 GTGCCAGTGCCTCCCAGGCTTCGGTGGCCCTGAGTGTGAGAAGTTGCTCAGTGTCAACTT 3741 

M HIM I II Mil II lllllll II III | Hill II II 

Qy 3420 atgtcagtgtttgcctggctatcagggagaaaagtgtgaaaaattggttagtgtgaattt 3479 

Db 3742 TGTGGATCGGGACACTIACCTGCAGTTCACTGACCTGCAAAACIGGCCACGGGCCAACAT 3801 

I I I II MM II III I II I IMI | | | inn 

Qy 3480 tataaacaaagagtcttatcttcagattccttcagccaaggttcggcctcagacgaacat 3539 

Db 3802 CACGTTGCAGGTCTCCACGGCAGAGGACAATGGGATCCTTCTGTACAACGGGGACAACGA 3861 

M I Ml I MM I II llll II IMM lllll II II lllll II 

Qy 3540 aacacttcagattgccacagatgaagacagcggaatcctcctgtataagggtgacaaaga 3599 
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Db 3862 CCACATTGCAGTTGAGCTGTACCAGGGCCATGTGCGTGTCAGCTACGACCCAGGCAGCTA 3921 

III II II II II II II I III I III Mil IMIII III I III I 
Qy 3600 ccatatcgcggtagaactctatcgggggcgtgttcgtgccagctatgacaccggctctca 3659 

Db 3922 CCCCAGCTCTGCCATCTACAGTGCTGAGACGATCAACGATGGGCAATTCCACACCGTTGA 3981 

II llllllll lllllll lllll Mill Mill I I Mill II II 
Qy 3660 tccagcttctgccatttacagtgtggagacaatcaatgatggaaacttccacattgtgga 3719 

Db 3982 GCTGGTTGCCTTTGACCAGATGGTGAATCTCTCCATTGATGGCGGGAGCCCCATGACCAT 4041 

ii iiinii ii mi i i i iii i Mill nil inn i in 

Qy 3720 actacttgccttggatcagagtctctctttgtccgtggatggtgggaaccccaaaatcat 3779 
Db 4042 GGACAACTTTGGCAAACATTACACGCTCAACAGCGAGGCGCCACTCTATGTGGGAGGGAT 4101 

Mill II II I III II II II I IMIMIIIII lllll II 
Qy 3780 cactaacttgtcaaagcagtccactctgaattttgactctccactctatgtaggaggcat 3839 

•4102 GCCCGTGGATGTCAACTCAGCTGCCTTCCGCCTGTGGCAGATCCTCAACGGCACCGGCTT 4161 
III I I I III II I I Mil I I I lllll III Mil 
Qy 3840 gccagggaagagtaacgtggcatctctgcgccaggcccctgggcagaacggaaccagctt 3899 

Db 4162 CCACGGTTGCATCCGAAACCTGTACATCAACAACGAGCTGCAGGACTTCACCAAGACGCA 4221 

iiini iiiiiiii inn iiiiiiii imimiiimi m n 

Qy 3900 ccacggctgcatccggaacctttacatcaacagtgagctgcaggacttccagaaggtgcc 3959 
Db 4222 GATGAAGCCAGGCGTGGTGCCAGGCTGCGAACCCTGCCGCAAGCTCTACTGCCTGCATGG 4281 

1 1 ii i inn i mi iiiii n ii mi mi n inn 

Qy 3960 gatgcaaacaggcattttgcctggctgtgagccatgccacaagaaggtgtgtgcccatgg 4019 
Db 



Qy 



4282 CATCTGCCAGCCCAATGCCACCCCAGGGCCCATGTGCCACTGCGAGGCTGGCTGGGTGGG 4341 

II MMIIil! | llll II III I III III II III Mil 
4020 cacatgccagcccagcagccaggcaggcttcacctgcgagtgccaggaaggatggatggg 4079 



Db 4342 CCTGCACTGTGACCAGCCCGCTGACGGCCCCTGCCATGGCCACAAGTGTGTCCATGGGCA 4401 

I I niiimi I I I I III llll Ml I II II II lllll 

Qy 4080 gcccctctgtgaccaacggaccaatgacccttgccttggaaataaatgcgtacatggcac 4139 

Db 4402 ATGCGTGCCCCTCGACGCTCTTTCCTACAGCTGCCAGTGCCAGGATGGGTACTCGGGGGC 4461 

III Mill Mill I MIMIIMM lllll III III Ml 
Qy 4140 ctgcttgcccatcaatgcgttctcctacagctgtaagtgcttggagggccatggaggtgt 4199 

Db 4462 ACIGIGCAACCAGGCCGGGGCCCTGGCAGAGCCCTGCAGAGGCCTGCAGTGCCTGCATGG 4521 

ii ii i ii iii mi i ii mi i i iiiii mm 

Qy 4200 cctctgtgatgaagaggaggatctgtttaacccatgccaggcgatcaagtgcaagcatgg 4259 
A 4522 CCACTGCCAGGCCTCAGGCACCAAGGGGGCACACTGTGTGTGTGACCCCGGCTTTTCGGG 4581 

W i iii i mil i i i linn n i n i mi 

Qy 4260 gaagtgcaggctttcaggtctggggcagccctactgtgaatgcagcagtggatacacggg 4319 

Db 4582 CGAGCTGTGTGAGCAAGAGTCCGAGTGCCGGGGGGACCCTGTCCGGGACTTTCACCAGGT 4641 

II lllll I III I II II lllll I I II I I llll 

Qy 4320 ggacagctgtgatcgagaaatctcttgtcgaggggaaaggataagagattattaccaaaa 4379 

Db 4642 CCAGAGGGGCTATGCCATCTGCCAGACCACGCGCCCCCTGTCATGGGTGGAGTGCCGGGG 4701 

iii imiMM inn ii ii iiii i i mm i n 

Qy 4380 gcagcagggctatgctgcttgccaaacaaccaagaaggtgtcccgattagagtgcagagg 4439 
Db 4702 CTCGTGCCCAGGCCAGGGCTGCTGCCAGGGCCTTCGGCTGAAGCGGAGGAAGTTCACCTT 4761 

iii mm i iiiii ii n mm mi iiiii 

Qy 4440 tgggtgtgcaggagggcagtgctgtggaccgctgaggagcaagcggcggaaatactcttt 4499 

Db 4762 TGAGIGCAGCGATGGGACCTCTTTTGCCGAGGAGGTGGAAAAGCCCACCAAGTGTGGCTG 4821 

II llll II II llll llll II Mill II II lllll lllll 

Qy 4500 cgaatgcactgacggctcctcctttgtggacgaggttgagaaagtggtgaagtgcggctg 4559 

Db 4822 T 4822 
I 

Qy 4560 t 4560 



RESULT 6 

LOCUS AB017169 5015 bp mRNA PRI Of 

DEFINITION Homo sapiens mRNA for Slit-3 protein, complete cds. 



ACCESSION 
NID 

VERSION 
KEYWORDS 
SOURCE 
ORGANISM 



AB017169 
g4049588 

AB017169.1 61:4049588 
slit-3; Slit-3 protein. 

Homo sapiens fetal tissue_lib:lung cDNA to mRNA. 



AUTHORS 

TITLE 

JOURNAL 



AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
FEATURES 



BASE COUNT 
ORIGIN 



Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Primates; Catarrhini; Hominidae; Homo, 

1 (bases 1 to 5015) 
Itoh,A. and Sakano,S. 
Direct Submission 

Submitted (27-AUG-1998) to the DDBJ/EMBL/GenBank databases. Akira 
Itoh, Asahi Chemical Industry co.,ltd., Life Science Fundamental 
Research Laboratory; 2-1, Samejima, Fuji, Shizuoka 416-8501, Japan 
( E-mail : a8 6114 83 «ut . asahi-kasei.co.jp, Tel :+81-545-62 -3231, 
Fax:+81-545-62-3249) 

2 (sites) 

Itoh,A., MiyabayashiJ., Ohno,M. and Sakano,S. 

Cloning and expressions of three mammalian homologues of Drosophila 

slit suggest possible roles for Slit in the formation and 

maintenance of the nervous system 

Brain Res. Mol. Brain Res. 62 (2), 175-186 (1998) 

99033071 

Location/Qualifiers 
1. .5015 

/organism= "Homo sapiens" 

/db_xref-"taxon:9606" 

/dev_stage="fetal r 

/tissue_lib="lung" 

264. .4835 

/gene="slit-3" 

264. .4835 

/gene="slit-3" 

/codon_start-l 

/product»"Slit-3 protein" 

/protein_id="BAA35186,l" 

/db_xref="PID:dl036172" 

/db_xref-"PID:g4049589" 

/dbjcref-'GI: 4049589" 

/translation 0 "MAPGWAGVGAAVRARLALALALASVLSGPPAVACPTKCTCSAAS 
VDCHGLGLRAVPRGIPRNAERLDLDRNNITRITKMDFAGLKNLRVLHLEDNQVSVIER 
GAFQDLKQLERLRLNKNKLQVLPELLFQSTPKLTRLDLSENQIQGIPRKAFRGITDVK 
NLQLDNNHISCIEDGAFRALRDLEILTLNNNNISRILVTSFNHMPKIRTLRLHSNHLY 
CDCHLAtfLSDWLRQRRTVGQFTLCMAPVHLRGFNVADVQKKEYVCPAPHSEPPSCNAN 
SISCPSPCTCSNNIVDCRGKGLMEIPANLPEGIVEIRLEQNSIKAIPAGAFTQYKKLK 
RIDISKNQISDIAPDAFQGLKSLTSLVLYGNKITEIAKGLFDGLVSLQLLLLNANKIN 
CLRVNTFQDLQNLNLLSLYDNKLQTISKGLFAPLQSIQTLHLAQNPFVCDCHLKWLAD 
YLQDNPIETSGARCSSPRRLANKRISQIKSKKFRCSGSEDYRSRFSSECFMDLVCPEK 
CRCEGTIVDCSNQKLVRIPSHLPEYVTDLRLNDNEVSVLEATGIFKKLPNLRKINLSN 
NKIKEVREGAFDGAASVQELMLTGNQLETVHGRVFRGLSGLKTLMLRSNLIGCVSNDT 
FAGLSSVRLLSLYDNRITTITPGAFTTLVSLSTINLLSNPFNCNCHLAWLGRWLRKRR 
IVSGNPRCQKPFFLKEIPIQDVAIQDFTCDGNEESSCQLSPRCPEQCTCMETWRCSN 
KGLRALPRGMPKDVTELYLEGNHLTAVPRELSALRHLTLIDLSNNSISMLTNYTFSNM 
SHLSTLILSYNRLRCIPVHAFNGLRSLRVLTLHGNDISSVPEGSFNDLTSLSHLALGT 
NPLHCDCSLRWLSEWVKAGYKEPGIARCSSPEPMADRLLLTTPTHRFQCKGPVDINIV 
AKCNACLSSPCKNNGTCTQDPVELYRCACPYSYKGKDCTVPINTCIQNPCQHGGTCHL 
SDSHKDGFSCSCPLGFEGQRCEINPDDCEDNDCENNATCVDGINNYVCICPPNYTGEL 
CDEVIDHCVPELNLCQHEAKCIPLDKGFSCECVPGYSGKLCETDNDDCVAHKCRHGAQ 
CVDTINGYTCTCPQGFSGPFCEHPPPMVLLQTSPCDQYECQNGAQCIWQQEPTCRCP 
PGFAGPRCEKLITVNFVGKDSYVELASAKVRPQANISLQVATDKDNGILLYKGDNDPL 
ALELYQGHVRLVYDSLSSPPTTVYSVETVNDGQFHSVELVTLNQTLNLWDKGTPKSL 
GKLQKQPAVG INS PLY LGG I PT STGLSALRQGTDRPLGG FHGC IHEVRI NNELQDF KA 
LPPQSLGVSPGCKSCTVCKHGLCRSVEKDSWCECRPGWTGPLCDQEARDPCLGHRCH 
HGKCVATGTSYMCRCAEGYGGDLCDNKNDSANACSAFKCHHGQCHISDQGEPYCLCQP 
GFSGEHCQQENPCLGQWREVIRRQRGYASCATASKVPIMECRGGCGPQCCQPTRSKR 
RKYVFQCTDGSSFVEEVERHLECGCLACS " 
1119 a 1611 c 1376 g 909 t 



Query Match 25.3%; 
Best Local Similarity 64.6%; 
Matches 2894; Conservative 



Score 1202; DB 29; Length 5015; 
Pred. No. 0.00e+00; 

0; Mismatches 1566; Indels 21; Gaps 13; 
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Db 


360 


Qy 


79 


Db 


420 


Qy 


139 


Db 


480 


Qy 


199 


Db 


540 


Qy 


259 


Db 


600 


Qy 


319 


1 


660 


V 


379 


Db 


720 


Qy 


439 


Db 


780 


Qy 


499 


Db 


840 


Qy 


559 


Db 


900 


Qy 


619 


Db 


960 


Qy 


679 


Db 


1020 


Qy 
Db 


739 
1080 


• 


799 


Db 


1131 


Qy 


859 


Db 


1191 


Qy 


919 


Db 


1251 


Qy 


979 


Db 


1311 


Qy 


1039 


Db 


1371 


Qy 


1099 


Db 


1431 



II Mill I llll I Mill 



ii iiiiiiii immiii ii 



III II II II IIIIIIIIMN I III I II II I I lllllll 



ATCACCAGGATCACCAAGATGGACHCGCTGGGCTCAAGMCCTCCGAGTCTTGCATCTG 539 
UNI II II II llll II II Mill III I II INI I II II 



Mill II IIIIIIII II II I MUM 



NIMH I llll III I II II I MM II Mill II I I II 



I II II 1 1 1 1 1 I M 1 1 I II II 1 1 1 1 1 II II 1 1 1 1 I II 1 1 II I II 1 1 



II llllllllll lllllll MIMIII III 



IIIIIIII II III I II II II II llll I II II IIIIIIII IMMI 



II I I I I II Ml MIMIII Mill M I I III I II II 



II II Ml lllllll II IIIIIIIIIMIIIIIIIIIIII II Mill II I 



Mill I II III I II llll II II llll Ml II 



II II II II II II II I II III I 



I I I II II 



TGC AATGC - - CAACTCC - A - - 
I I II I II I 



-TCT-C-CTGCCCTICGCCCTGCACGTGCAGCAATAAC 1130 
I I I lllllll I llll II II Mill II 



Mil IIIIIIII II II II II Mill II I II I II Ml MM 



IIIIIIII I lllllllll I lllllll IMMM I Mill III I I 



11.11 II II I Mill Ml I Mill MIMIII II II llll MUM 



II Mill M II M II I II II Mill Mill M II Mill II 



I llll I I Mill 1 1 Ml I III lllllll M I I IMIIIMI 



1431 AAGATCAACTGCCTGCGGGTGAACACGTTTCAGGACCTGCAGAACCTCAACTTGCTCTCC 1490 



Mill MIMIII Mill 



Qy 




Db 


1491 


Qy 


1219 


Db 


1551 


Qy 


1279 






Qy 








Qy 


1399 


Db 


1731 




1459 


Db 


1791 


Qy 




Db 




Qy 




Db 




Qy 


1639 


Db 


1971 


Qy 


1699 


Db 


2031 


Qy 


1759 






Qy 




Db 


2151 




1879 


Db 


2211 


Qy 


1939 


Db 


2271 


Qy 




Db 




Qy 


2059 


Db 


2391 


Qy 


2119 


Db 


2448 


Qy 


2179 


Db 


2508 



i iiiiiiii ii ii ill i iii i Mini 



II IIIIMIIIIIIII lllllllll lllllll Ml I Mill I i mi 



ii ii i ii ii mil inn mi i ii iiiiiiii i iiiiMii ii 



ii ii urn mm ii him n n iiim mm miim mim 



ii ii urn i ii i miim immiii mil n mil i n 



ii m i ii mil mi ii i mil ii mm 



IIIIIIII in I II II inn II ii inn I Mill in 



I II I II I II MM II Ml I Mill I I II llll III 



II II Mill Mill Ml I I II MIMIII I II ! 1 1 ! 1 1 1 1 1 II I 



i ii i iiiiiiii ii ii mm i mi i ii i i ii ii 



I II I MM I Mill 



Mill II I 



mm mi ii ii ii mini i imiii ii ii ii mi 



ii m i mm ii i iiim iiii ii ii ii ii ii mil ii 



II II II I II II IIIIIIII II MM II II Mill MIIM 



lllllll III I II MIM MM llll I I II MM II II IIIM 



II II II llll Mill II II II IIIIIIIIIMIIIMIMI MIM 



II lllllll I II II II I III III I 



I Mill II 



ii ii m iiii mil ii mil immim i Mr 



II II II I Mill II III MM Mill MIMIII I III II III 
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Qy 2239 ggtattccaagagatgtcacagagttgtatctggatggaaaccaatttacactggttccc 2298 

Db 2568 AGAGAGCTGTCCGCCCTCCGACACCTGACGCTTATTGACCTGAGCAACAACAGCATCAGC 2627 
I II II III I I III I II Mill III I II llllllll II III 

Qy 2299 aaggaactctccaactacaaacatttaacacttatagacttaagtaacaacagaataagc 2358 

Db 2628 ATGCTGACCAATTACACCTTCAGTAACATGTCTCACCTCTCCACTCTGATCCTGAGCTAC 2687 

I III I III I I llllll MINI I II III III I II II II III 

Qy 2359 acgctttctaatcagagcttcagcaacatgacccagctcctcaccttaattcttagttac 2418 

Db 2688 AACCGGCTGAGGTGCATCCCCGTCCACGCCTTCAACGGGCTGCGGTCCCTGCGAGTGCTA 2747 
HIM Mill II II II II MM I II I III II III I II 

Qy 2419 aaccgtctgagatgtattcctcctcgcacctttgatggattaaagtctcttcgattactt 2478 
Db 2748 ACCCTCCATGGCAATGACATTICCAGCGTTCCTGAAGGCTCCTTCAACGACCTCACATCT 2807 

I 1 1 Mill MIMMIIII II llllllll I Mill II II I I 

• 247 9 tctctacatggaaatgacatttctgttgtgcctgaaggtgctttcaatgatctttctgca 2538 
2808 CTTTCCCATCTGGCGCTGGGAACCAACCCACTCCACTGTGACTGCAGTCTTCGGTGGCTG 2867 

I II Mill II I III in; m IIMIII III I I Mil I 
Qy 2539 ttatcacatctagcaattggagccaaccctctttactgtgattgtaacatgcagtggtta 2598 

Db 2868 TCGGAGTGGGTGAAGGCGGGGTACAAGGAGCCTGGCATCGCCCGCIGCAGTAGCCCTGAG 2927 

II II MUM 1 1 III || MIMMIIII II II II II II MM 

Qy 2599 tccgactgggtgaagtcggaatataaggagcctggaattgctcgttgtgctggtcctgga 2658 

Db 2928 CCCATGGCTGACAGGCTCCTGCTCACCACCCCAACCCACCGCTTCCAGTGCAAAGGGCCA 2987 

Mill llllll Mill M II II I M II Mil II 
Qy 2659 gaaatggcagataaacttttactcacaactccctccaaaaaatttacctgtcaaggtcct 2718 

Db 2988 GTGGACATCAACATTGTGGCCAAATGCAATGCCTGCCTCTCCAGCCCGTGCAAGAATAAC 3047 

Mill llll III I II II II II IIMIII II I Mill II III I 
Qy 2719 gtggatgtcaatattctagctaagtgtaacccctgcctatcaaatccgtgtaaaaatgat 2778 

Db 3048 GGGACATGCACCCAGGACCCTGTGGAGCTGTACCGCTGTGCCTGCCCCTACAGCTACAAG 3107 

II Mill I II II M M I Mill M llll MM II llll 
Qy 2779 ggcacatgtaatagtgatccagttgacttttaccgatgcacctgtccatatggtttcaag 2838 

Db 3108 GGCAAGGACTGCACTGTGCCCATCAACACCTGCATCCAGAACCCCTGTCAGCATGGAGGC 3167 

II IIMIII III II II I llllllll Mill III I llllllll 
Qy 2839 gggcaggactgtgatgtcccaattcatgcctgcatcagtaacccatgtaaacatggagga 2898 

Db 3168 ACCTGCCACCTGAGTGACAGCCACAAGGATGGGTTCAGCTGCTCCTGCCCTCTGGGCTTT 3227 
M llllll I I II I I I Mill III I M II II II III 

♦2899 acttgccacttaaaggaaggagaagaagatggattctggtgtatttgtgctgatggattt 2958 
3228 GAGGGGCAGCGGTGTGAGATCAACCCAGATGACTGTGAGGACAACGACTGCGAAAACAAI 3287 

II M I Mill Mill Mill Mill II II Mill Mill III 
Qy 2959 gaaggagaaaattgtgaagtcaacgttgatgattgtgaagataatgactgtgaaaataat 3018 

Db 3288 GCCACCTGCGTGGACGGGATCAACAACTACGTGTGTATCTGTCCGCCIAACTACACAGGT 3347 

I II II M M M M II llllll II I II II III I II llllll 
Qy 3019 tctacatgtgtcgatggcattaataactacacatgcctttgcccacctgagtatacaggt 3078 

Db 3348 GAGCTATGCGACGAGGTGATTGACCACTGTGTGCCTGAGC1GAACCTCTGTCAGCATGAG 3407 

Ml I M II Ml I I III Mill I II IIMIII III lllll II 
Qy 3079 gagttgtgtgaggagaagctggacttctgtgcccaggacctgaacccctgccagcacgat 3138 

Db 3408 GCCAAGTGCATCCCCCTGGACAAAGGATTCAGCTGCGAGTGTGTCCCTGGCTACAGCGGG 3467 

I IIIMIMM II IIMIII M II II II II III II 

Qy 3139 tcaaagtgcatcctaactccaaagggattcaaatgtgactgcacaccagggtacgtaggt 3198 

Db 3468 AAGCTCTGTGAGACAGACAATGATGACTGTGTGGCCCACAAGTGCCGCCACGGGGCCCAG 3527 

I I III II I II III Mill I I IMIMI llll lllll 
Qy 3199 gaacactgcgacatcgattttgacgactgccaagacaacaagtgtaaaaacggagcccac 3258 

Db 3528 TGCGTGGACACAATCAATGGCTACACATGCACCTGCCCCCAGGGCTTCAGTGGACCCITC 3587 

Ml II II I II Mill II llll llllll I II I llllll III 
Qy 3259 tgcacagatgcagtgaacggctatacgtgcatatgccccgaaggttacagtggcttgttc 3318 

Db 3588 TGTGAACACCCCCCACCCATGGTCCTACTGCAGACCAGCCCATGCGACCAGTACGAGTGC 3647 

Mill I MIMIIIIIIMI I I IIIIM II II II II II 
Qy 3319 tgtgagttttctccacccatggtcctccctcgtaccagcccctgtgataattttgattgt 3378 



Db 3648 CAGAACGGGGCCCAGTGCATCGTGGTGCAGCAGGAGCCCACCTGCCGCTGCCCACCAGGI 3707 

lllll II II Mill lllll I Mill I II I II II II 

Qy 3379 cagaatggagctcagtgtatcgtcagaataaatgagccaatatgtcagtgtttgcctggc 3438 

Db 3708 TTCGCCGGCCCCAGATGCGAGAAGCTCATCACTGTCAACTTCGTGGGCAAAGACTCCTAC 3767 

I M I II II II I I I III II II I IIIIM M II 

Qy 3439 tatcagggagaaaagtgtgaaaaattggttagtgtgaattttataaacaaagagtcttat 3498 

Db 3768 GTGGAACTGGCCTCCGCCAAGGTCCGACCCCAGGCCAACATCTCCCTGCAGGTGGCCACT 3827 

i i i i ii iiiiiiii ii ii m i iiiii i ii iii i urn 

Qy 3499 cttcagattccttcagccaaggttcggcctcagacgaacataacacttcagattgccaca 3558 
Db 3828 GACAAGGACAACGGCATCCTTCTCTACAAAGGAGACAATGACCCCCTGGCACTGGAGCTG 3887 

II I llll III Mill II II II II Mill MM III III M 

Qy 3559 gatgaagacagcggaatcctcctgtataagggtgacaaagaccatatcgcggtagaactc 3618 

Db 3888 TACCAGGGCCACGTGCGGCTGGTCTATGACAGCCTGAGTTCCCCTCCAACCACAGTGTAC 3947 

ii I in i ii ii mimi i i ii ii i i iii 

Qy 3619 tatcgggggcgtgttcgtgccagctatgacaccggctctcatccagcttctgccatttac 3678 
Db 3948 AGTGTGGAGACAGTGAATGATGGGCAGTTTCACAGTGTGGAGCTGGTGACGCTAAACCAG 4007 

milium i mini i n mi hum i i i i mi 

Qy 3679 agtgtggagacaatcaatgatggaaacttccacattgtggaactacttgccttggatcag 3738 
Db 4008 ACCCTGAACCTAGTAGTGGACAAAGGAACTCCAAAGAGCCTGGGGAAGCTCCAGAAGCAG 4067 

i ii i iiiii ii i ii im i i iii mm 

Qy 3739 agtctctctttgtccgtggatggtgggaaccccaaaatcatcactaacttgtcaaagcag 3798 

Db 4068 CCAGCAGTGGGCATCAACAGCCCCCTCTACCITGGAGGCATCCCCACCTCCACCGGCCTC 4127 

MM I II II Mill I llllllll II III 
Qy 3799 tccactctgaattttgactctccactctatgtaggaggcatgccagggaagagtaacgtg 3858 

Db 4128 TCCGCCTTGCGCCAGGGCACGGACCGGCCTCTAGGCGGCTTCCACGGATGCATCCATGAG 4187 

i i mum mim i i iiimmi mm i 

Qy 3859 gcatctctgcgccaggcccctgggcagaacggaaccagcttccacggctgcatccggaac 3918 

Db 4188 GTGCGCATCAACAACGAGCTGCAGGACTTCAAGGCCCTCCCACCACAGTCCCTGGGGGTG 4247 

I llllllll 1 1 1 1 1 M I M 1 1 1 1 1 II III Ml II 
Qy 3919 ctttacatcaacagtgagctgcaggacttccagaaggtgccgatgcaaacaggcattttg 3978 

Db 4248 TCAC-CAGGCTGCAA-GTC-CTGCACCGTGTGCAAGCACGGCCTGTGCCGCTCCGTGGAG 4304 

I II II I I I I I lllll II Ml llll II 

Qy 3979 cctggctgtgagccatgccacaagaaggtgtgtgcccatggcacatgccagcccagcagc 4038 

Db 4305 AAGGACAGCGTGGTGTGCGAGTGCCGCCCAGGCTGGACCGGCCCACTCTGCGATCAGGAG 4364 

Ml M I lllimill III MM II II Mill II M I 
Qy 4039 caggcaggcttcacctgcgagtgccaggaaggatggatggggcccctctgtgaccaacgg 4098 

Db 4365 GCCCGGGACCCCTGCCTCGGCCACAGATGCCACCATGGAAAAIGTGTGGCAA-C--TGGG 4421 

II lllll lllll II I I llll Mill I II III I I II I 

Qy. , 4099 accaatgacccttgccttggaaataaatgcgtacatggcacctgcttgcccatcaatgcg 4158 

Db 4422 ACCTCATACATGTGCAAGTGTGCCGAGGGCTATGGAGGGGACTTGTGTGACAACAAGAAT 4481 

III llll II lllll llllll IIMIII I I I lllll I II I 
Qy 4159 ttctcctacagctgtaagtgcttggagggccatggaggtgtcctctgtgatgaagaggag 4218 

Db 4482 GACTCTGCCAATGCCTGCTCAGCCTTCAAGTGTCACCATGGGCAGTGCCACATCTCAGAC 4541 

II II I III M MIMM I llllll lllll I llll 

Qy 4219 gatctgtttaacccatgccaggcgatcaagtgcaagcatgggaagtgcaggctttcaggt 4278 

Db 4542 CAAGGGGAGCCCTACTGCCTGTGCCAGCCCGGCTTTAGCGGCGAGCACIGCCAACAAGAG 4601 

i iii Minimi im ii i i ii ii im i i iii 

Qy 4279 ctggggcagccctactgtgaatgcagcagtggatacacgggggacagctgtgatcgagaa 4338 

Db 4602 AATCCGTGCCTGGGACAAGTAGTCCGAGAGGTGATCCGCCGCCAGAAAGGTTATGCATCA 4661 

I I M I M II I MM II III I M Mill I 

Qy 4339 atctcttgtcgaggggaaaggataagagattattaccaaaagcagcagggctatgctgct 4398 

Db 4662 TGTGCCACAGCCTCCAAGGTGCCCATCATGGAAIGTCGTGGGGGCTGTG--GGCCC-CAG 4718 

II III II llllll II lllll I II II llll II III 

Qy 4399 tgccaaacaaccaagaaggtgtcccgattagagtgcagaggtgggtgtgcaggagggcag 4458 
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Db 4719 TGCTGCCAGCCCACCCGCAGCAAGCGGCGGAAATACGTCTTCCAGTGCACGGACGGCTCC 4778 

urn ii i iiiiiiiiiiiiiiiiii in i inn iiiiinn 

Qy 4459 tgctgtggaccgctgaggagcaagcggcggaaatactctttcgaatgcactgacggctcc 4518 

Db 4779 TCGTTTGTAGAAGAGGTGGAGAGACACTTAGAGTGCGGCTG 4819 

ii inn ii inn m i i i minim 

Oy 4519 tcctttgtggacgaggttgagaaagtggtgaagtgcggctg 4559 
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AB011530 4950 bp mRNA ROD 22-AUG-1998 

Rattus norvegicus mRNA for MEGF4 , complete cds, 

AB011530 

g3449289 

AB011530.1 61:3449289 
MEGF4, 

Rattus norvegicus (strain:Sprague-Dawley) adult brain cDNA to mRNA, 
clone_lib:pSPORT 1. 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Rodentia; Sciurognathi; Muridae; Murinae; Rattus. 

1 (bases 1 to 4950) 
Nakayama,M., Nakajima,D. and Ohara,0. 
Direct Submission 

Submitted (26-FEB-1998) to the DDBJ/EMBL/GenBank databases. Manabu 
Nakayama, Kazusa DNA Research Institute, Laboratory of DNA 
technology; 1532-3, Yana, Kisarazu, Chiba 292-0812, Japan 
(E-mail :nmanabu@kazusa. or.jp, Tel:+81-438-52-3915, 
Fax:+81-438-52-3914) 

2 (sites) 

Nakayama ,M., Naka]ima,D., Nagase,T., Nomura ,N. ( Seki,N. and 
Ohara,0. 

Identification of high-molecular -weight proteins with multiple 
EGF-like motifs by motif-trap screening 
Genomics 51 (1), 27-34 (1998) 
98360089 

Location/Qualifiers 
1. .4950 

/organism-"Rattus norvegicus" 

/strain- " Sprague -Dawley " 

/dbjtref-'taxon: 10116* 

/clone_lib""pSPORT 1" 

/dev_stage° "adult" 

/tissue_type a "brain" 

160. .4755 

/gene="MEGF4" 

160. .4755 

/gene»"MEGF4" 

/note="rat homologue of Drosophila slit protein; rat 
slitl" 

/codon_start=l 
/product" "MEGF4" 
/protein_id-"BAA32460.1" 
/db_xref="PID:dl033423" 
/db_xref="PID:g3449290" 
/db_xref="GI: 3449290" 

/translation="MALTPQRGSSSGLSRPELWLLLWAAAWRLGATACPALCTCTGTT 
VDCHGTGLQAIPKNIPRNTERLELNGNNITRIHKNDFAGLKQLRVLQLMENQIGAVER 
GAFDDMKELERLRLNRNQLQVLPELLFQNNQALSRLDLSENSLQAVPRKAFRGATDLK 
NLQLDKNQISCIEEGAFRALRGLEVLTLNNNNITTIPVSSFNHMPKLRTFRLHSNHLF 
CDCHLAWLSQWLRQRPTIGLFTQCSGPASLRGLNVAEVQKSEFSCSGQGEAAQVPACT 
LSSGSCPAMCSCSNGIVDCRGKGLTAIPANLPETMTEIRLELNGIKSIPPGAFSPYRK 
LRRIDLSNNQIAEIAPDAFQGLRSLNSLVLYGNKITDLPRGVFGGLYTLQLLLLNANK 
INCIRPDAFQDLQNLSLLSLYDNKIQSLAKGTFTSLRAIQTLHLAQNPFICDCNLKWL 
ADFLRTNPIETTGARCASPRRLANKRIGQIKSKKFRCSAKEQYFIPGTEDYHLNSECT 
SDVACPHKCRCEASWECSGLKLSKIPERIPQSTTELRLNNNEISILEATGLFKKLSH 
LKKINLSNNKVSEIEDGTFEGATSVSELHLTANQLESVRSGMFRGLDGLRTLMLRNNR 
ISCIHNDSFTGLRNVRLLSLYDNHITTISPGAFDTLQALSTLNLLANPFNCNCQLAWL 
GDWLRKRKIVTGNPRCQNPDFLRQIPLQDVAFPDFRCEEGQEEVGCLPRPQCPQECAC 
LDTVVRCSNKHLQALPKGIPKNVTELYLDGNQFTLVPGQLSTFKYLQLVDLSNNKISS 
LSNSSFTNMSQLTTLILSYNALQCIPPLAFQGLRSLRLLSLHGNDVSTLQEGIFADVT 
SLSHLAIGANPLYCDCHLRWLSSWVKTGYKEPGIARCAGPPEMEGKLLLTTPARKFEC 
■ QGPPSLAVQAKCDPCLSSPCQNQGTCHNDPLEVYRCTCPSGYKGRNCEVSLDSCSSNP 



CGNGGTCHAQEGEDAGFTCSCPSGFEGLTCGMNTDDCVKHDCVNGGVCVDGIGNYTCQ 
CPLQYTGRACEQLVDFCSPDLNPCQHEAQCVGTPEGPRCECVPGYTGDNCSKNQDDCK 
DHQCQNGAQCVDEINSYACLCAEGYSGQLCEIPPAPRNSCEGTECQNGANCVDQGSRP 
VCQCLPGFGGPECEKLLSVNFVDRDTYLQFTDLQNWPRANITLQVSTAEDNGILLYNG 
DNDHIAVELYQGHVRVSYDPGSYPSSAIYSAETINDGQFHTVELVTFDQMVNLSIDGG 
SPMTMDNFGKHYTLNSEAPLYVGGMPVDVNSAAFRLWQILNGTSFHGCIRNLYINNEL 
QDFTKTQMKPGWPGCEPCRKLYCLHGICQPNATPGPVCHCEAGWGGLHCDQPVDGPC 
HGHKCVHGKCVPLDALAYSCQCQDGYSGALCNQVGAVAEPCGGLQCLHGHCQASATRG 
AHCVCSPGFSGELCEQESECRGDPVRDFHRVQRGYAICQTTRPLSWVECRGACPGQGC 
CQGLRLKRRKLTFECSDGTSFAEEVEKPTKCGCAPCA" 
polyA_site 4950 

/note-" 15 a nucleotides" 

BASE COUNT 1076 a 1501 c 1398 g 975 t 

ORIGIN 



Query Match 24.7%; 
Best Local Similarity 64.9%; 
Matches 2928; Conservative 



Score 1177; DB 32; Length 4950, 
Pred. No. 0.00e+00; 
0; Mismatches 1553, 



Indels 33; Gaps 23; 



Db 255 GGCGTGCCCCGCCCTCTGCACCTGCACCGGCACTACAGTGGACTGCCACGGCACGGGTTT 314 

Mil ii i in i in i mi iiiiiiiiin inn H i 

Qy 78 ggcgtgcccggcgcagtgctcttgctcgggcagcacagtggactgtcacgggctggcgct 137 
Db 315 GCAGGCCATCCCCAAGAACATCCCACGGAACACCGAGCGCCTGGAACTCAATGGTAACAA 374 

ii 1 1 mi in mil ii mmm I Mill I Mill II II 
Qy 138 gcgcagcgtgcccaggaatatcccccgcaacaccgagagactggatttaaatggaaataa 197 

Db 375 TATCACTCGGATCCATAAGAATGACTTTGCTGGGCTCAAGCAGCTTCGAGTGTTGCAGCT 434 

Mill I II MM II 1 1 1 M 1 1 1 II I II II MM I Mill 
Qy 198 catcacaagaattacgaagacagattttgctggtcttagacatctaagagttcttcagct 257 

Db 435 GATGGAGAACCAGATTGGAGCTGTGGAACGGGGAGCTTTTGATGACATGAAGGAGTTGGA 494 

MIIMM Mill I I I III I Mill II I II I M II I || 
Qy 258 tatggagaataagattagcaccattgaaagaggagcattccaggatcttaaagaactaga 317 

Db 495 GCGGCTACGACTGAACCGCAACCAGCTGCAAGTGTTGCCTGAACTACTGTTCCAGAACAA 554 

I I M II I Ml I II II II II MM Mill I Mill I I 
Qy 318 gagactgcgtttaaacagaaatcaccttcagctgtttcctgagttgctgtttcttgggac 377 

Db 555 CCAGGCTCTGTCCAGACTGGACCTGAGTGAGAATTCTCTCCAGGCTGTGCCCAGAAAGGC 614 

I II I Ml M II II Mill M I Mill I II II M II 
Qy 378 tgcgaagctatacaggcttgatctcagtgaaaaccaaattcaggcaatcccaaggaaagc 437 

Db 615 TTTCCGAGGAGCCACAGACCTCAAAAATCTACAGCTGGACAAGAACCAGATCAGCTGCAT 674 

mum ii ii ill i mm i n inn i 11111111111111 n 

Qy 438 tttccgtggggcagttgacataaaaaatttgcaactggattacaaccagatcagctgtat 497 

Db 675 CGAGGAAGGGGCCTTCCGTGCCCTACGGGGACTGGAGGTTCTGACCCTGAACAACAACAA 734 

II M Mill Ml I II II MM Mill II II II II Mill Mill 
Qy 498 tgaagatggggcattcagggctctccgggacctggaagtgctcactctcaacaataacaa 557 

Db 735 CATCACCACCATCCCTGTGTCCAGTTTCAACCATATGCCCAAGCTTCGGACCTTCCGGCT 794 

III II I I Mill I MIMIMIIIMIIM II III MM II M II 
Qy 558 cattactagactttctgtggcaagtttcaaccatatgcctaaacttaggacttttcgact 617 

Db 795 GCACTCCAACCACCTGTTCTGTGACTGTCACCTGGCCTGGCTCTCGCAGTGGCTGAGACA 854 

Ml II III MUM IIMMII I M 1 1 ! M I M 1 1 1 i 1 1 I Mill I I 
Qy 618 gcattcaaacaacctgtattgtgactgccacctggcctggctctccgactggcttcgcaa 677 

Db 855 GAGGCCCACCATCGGGCTCTTCACCCAGTGCTCTGGGCCCGCCAGCCTTCGGGGCCTCAA 914 

Mill I II III III Mill II III II III I MM II 
Qy 678 aaggcctcgggttggtctgtacactcagtgtatgggcccctcccacctgagaggccataa 737 

Db 915 TGTGGCAGAGGTGCAAAAGAGTGAGTTCAGCTGCTCAGGCCAGGGGGA-GG-CTGCACAA 972 

Ml II Mill Mill I II II MM I III M 1 1 I II 
Qy 738 tgtagccgaggttcaaaaacgagaatttgtctgcagtgatgaggaagaaggtcaccagtc 797 

Db 973 GTACCTGCCTGCACCCTCTCATCCGGTTC-CTGCCCAGCCATGTGCAGCTGTAGCAATGG 1031 

I M II I I I I I || IMMI Ml || | MIIIIMI 
Qy 798 atttatggctccttcttgtagtgttttgcactgccctgccgcctgtacctgtagcaacaa 857 

Db 1032 CATCGTGGACTGCCGTGGCAAAGGCCTCACTGCCATCCCTGCTAACCTTCCGGAGACGAT 1091 
Mill Mill Mill Mill MMIM Mill I II Mill Mill II 
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Qy 


858 


Db 


1092 


Qy 


918 


Db 


1152 


Qy 


978 


Db 


1212 


Qy 


1038 


Db 


1272 






1 

W 




Qy 


1158 


Db 


1392 


Qy 


1218 


Db 


1452 


Qy 


1278 


Db 


1512 


Qy 


1338 


Db 


1572 


Qy 


1398 


Db 


1632 


Qy 


1458 








1500 


i 

w 




Qy 


1560 






Qy 


1620 


Db 


1872 


Qy 




Db 


1932 


Qy 




Db 


1992 


Qy 


1800 


Db 


2052 


Qy 


1860 


Db 


2112 


Qy 


1920 



858 tatcgtagactgtcgtgggaaaggtctcactgagatccccacaaatcttccagagaccat 917 



ii ii ii ii mi i mi nun nun n in inn n 



ii i linn i i ii mum n n mil mi mini n 

atataaaaagcttagacgaattgacctgagcaataatcagatctctgaacttgcaccaga 1037 



111 imiiii ii inn mini n iminmi n n mini 



i in i ii mm in mi n i inn n i minim 



mm linn i in mn n inn n n m i n inn 



immniiiiiii i mi iimim inn i i n mini 



ii n inn imnninn n n nun i iiinmin n 



ii I m minni mini i in mil n n inn n 



in n mn i n ii miiiniiiiinii mn n mi i in 



n I ii n 



n i mi i n i i i i i n i i i 

--gatcaaa-atta--ag-tggagactgctt-tgcggat--c-t 1499 



ill ii n i mn mini i i n n mm i mn 



i n mn mi ii mini in i in i n n n n mn 



n i i i n mn n i iniini n i n i mini 



i i mi m i mi inn n ii i 1 1 1 1 1 ; n i n 



i in i n n n i mi in 



inn in 



i ii n i ii mn mi i in ii ii i in i mn 



in mi n n n mn i ii in i ii n ii n n ii ii 



i i ii ii ii ii n ii mn i i n n i ! 1 1 ; 1 1 1 1 mini 



Db 


2172 


Qy 


1980 


Db 


2232 


Qy 


2040 


Db 


2292 


Qy 


2100 


Db 


2352 


Qy 


2160 


Db 


2412 


Qy 


2220 


Db 


2472 


Qy 


2280 


Db 


2532 


Qy 


2340 


Db 


2592 


Qy 


2400 


Db 


2652 


Qy 


2460 


Db 


2712 


Qy 


2520 


Db 


2772 


Qy 


2580 


Db 


2832 


Qy 


2640 


Db 


2892 


Qy 


2700 


Db 


2952 


Qy 


2760 


Db 


3012 


Qy 


2820 


Db 


3072 


Qy 


2880 


Db 


3132 


Qy 


2940 


Db 


3192 


Qy 


3000 



ii ii mn mm i nn m mi n mn i in n m 



m n ii ii in i mn ii in muni inn i n n 



mini i i mini him ii ii inn i m n i 



m ii ii n n i in mi ii n mn minium 



i in mi ii ii ii ii i mn i n mini n 



m n minimi mini n m n i in i in i 



ii mini ii in i i ii 



mn nun mn 



m i ii ii ii linn ii ii ii n mi nn i m i 



ii ii ii ii n in i nm mn in m n ii 



n ii i i i i ii ii ii ii nm mini n mn n 



ii n i i m i in 'in i i n nm mn nm 



n nun ii in mi i mi n inn n n in n 



m n inimi 



i in n mm i nun i n 



mi m i mn in i i in ii nn i in 



in n m i iiiiii i mini n m i in in n 



mini i mi nm mn mi n n n i in m in 

cccatgtaaacatggaggaacttgccacttaaaggaaggagaagaagatggattctggtg 2939 



in n ii mm i nn i i in mmnn i 



i inn m i nm mimi n ininn i n 
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Db 3252 CCCCCTGCAGTACACAGGAAGGGCCTGTGAACAGCTAGTGGACTTCTGCTCTCCGGATCT 3311 

Ml I INI INN I Mill II llllllllll I I III || 
Oy 3060 cccacctgagtatacaggtgagttgtgtgaggagaagctggacttctgtgcccaggacct 3119 

Db 3312 GAACCCATGTCAGCATGAGGCCCAATGTGTTGGCACCCCAGAGGGGCCCAGGTGTGAGTG 3371 

nun ii inn n i i n i n m mi n inn n 

Qy 3120 gaacccctgccagcacgattcaaagtgcatcctaactccaaagggattcaaatgtgactg 3179 

Db 3372 TGTGCCAGGCTACACGGGTGACAATTGCAGCAAAAACCAGGACGACTGCAAGGACCACCA 3431 

Hill Ul Mill I III II I Mill I I III || I 
Qy 3180 cacaccagggtacgtaggtgaacactgcgacatcgattttgacgactgccaagacaacaa 3239 

Db 3432 GTGTCAGAACGGGGCTCAGTGTGTGGATGAGATCAACAGCTACGCCTGTCTCIGTGCTGA 3491 

Ml I Hill II II II llll I III llll II! Ill Ml 
Oy 3240 gtgtaaaaacggagcccactgcacagatgcagtgaacggctatacgtgcatatgccccga 3299 

Db 3492 GGGCTACAGCGGACAGCTCTGTGAGAT--C-CCACC--TG--CC-CCCAGG-A--A-CTC 3539 

II lllll II I llllllll I I lllll II II III II III 
Qy 3300 aggttacagtggcttgttctgtgagttttctccacccatggtcctccctcgtaccagccc 3359 

3540 CTGTGAAGGGACTGAGTGCCAGAATGGAGCCAACTGTGTAGACCAGGGCAGCCGGCCCGT 3599 

MM III || IMIIIIMI I III | | | | mi 
3360 ctgtgataattttgattgtcagaatggagctcagtgtatcgtcagaataaatgagccaat 3419 

Db 3600 ATGCCAGTGCCTACCAGGCTTTGGAGGTCCCGAATGTGAGAAGTTGCTCAGTGTCAACTT 3659 

III Mill I II III! I II I lllll II III I lllll II II 
Qy 3420 atgtcagtgtttgcctggctatcagggagaaaagtgtgaaaaattggttagtgtgaattt 3479 

Db 3660 TGTGGACCGGGACACTTACCTGCAGTTCACTGACCTGCAGAACTGGCCTCGGGCCAACAT 3719 

I i ii n mi ii iii i ii ii mm i i in 

Qy 3480 tataaacaaagagtcttatcttcagattccttcagccaaggttcggcctcagacgaacat 3539 
Db 3720 CACTCTTCAGGTCTCCACAGCAGAGGACAATGGGATCCTCCTCIACAATGGGGATAATGA 3779 

ii mill i mm n mi n muni n n n n n n 

Qy 3540 aacacttcagattgccacagatgaagacagcggaatcctcctgtataagggtgacaaaga 3599 

Db 3780 CCACATTGCAGTTGAGCTGTACCAGGGCCATGTCCGTGTTAGCTACGACCCAGGCAGCTA 3839 

III II II II II II II I III I III llll lllll III I III I 
Qy 3600 ccatatcgcggtagaactctatcgggggcgtgttcgtgccagctatgacaccggctctca 3659 

Db 3840 CCCCAGCTCTGCTATCTACAGTGCTGAAACAATCAACGATGGGCAGTTCCACACAGTTGA 3899 

II inn II lllllll II llllllll lllll I MUM || || 

Qy 3660 tccagcttctgccatttacagtgtggagacaatcaatgatggaaacttccacattgtgga 3719 

Db 3900 GCTGGTGACCTTTGACCAGATGGTGAACCTCTCCATCGATGGTGGCAGCCCCATGACCAT 3959 

II I llll II llll I I III I llllllll I lllll I III 
Qy 3720 actacttgccttggatcagagtctctctttgtccgtggatggtgggaaccccaaaatcat 3779 

Db 3960 GGACAACTTTGGAAAGCACIACACACTCAACAGTGAGGCCCCCCTCTATGTGGGAGGGAT 4019 

m mn mm i m n n m i n ininii inn n 

W 3780 cactaacttgtcaaagcagtccactctgaattttgactctccactctatgtaggaggcat 3839 

Db 4020 GCCCGTGGATGTGAACICAGCTGCCTTCCGCCTGTGGCAGATCCTCAATGGCACCAGCTT 4079 

III I II III II I | MM || | || || llllllll 
Qy 3840 gccagggaagagtaacgtggcatctctgcgccaggcccctgggcagaacggaaccagctt 3899 

Db 4080 CCACGGTTGCATCCGGAATCTATACATCAACAACGAACTGCAGGACTTCACCAAGACACA 4139 

llllll IMIIIIMI II llllllllll || | M 1 1 1 1 1 1 1 1 1 III I 
Qy 3900 ccacggctgcatccggaacctttacatcaacagtgagctgcaggacttccagaaggtgcc 3959 

Db 4140 GATGAAGCCGGGCGTGGTGCCCGGCTGCGAGCCCTGCCGAAAACTCTACIGTCTACATGG 4199 

mi i i iii i iiii iiiii iiiii mi ii iii mn 

Qy 3960 gatgcaaacaggcattttgcctggctgtgagccatgccacaagaaggtgtgtgcccatgg 4019 



Db 4200 CATTTGCCAGCCCAACGCCACCCCAGGGCCCGTGTGCCACTGCGAGGCTGGCTGGGGGGG 4259 

n miiimi I I llll I III I III III II III III 

Qy 4020 cacatgccagcccagcagccaggcaggcttcacctgcgagtgccaggaaggatggatggg 4079 

Db 4260 CCTGCACTGTGACCAGCCAGTGGACGGCCCCTGCCATGGCCACAAGTGTGTCCATGGGAA 4319 

I I lllllllll I I I III llll III I II II II lllll | 

Qy 4080 gcccctctgtgaccaacggaccaatgacccttgccttggaaataaatgcgtacatggcac 4139 

Db 4320 ATGCGTGCCCCTCGATGCACTTGCCTACAGCTGCCAATGCCAGGATGGATACTCGGGGGC 4379 



in mn ii nn i iniiinn i in m n i n i 

4140 ctgcttgcccatcaatgcgttctcctacagctgtaagtgcttggagggccatggaggtgt 4199 

4380 TCTGTGCAACCAGGTCGGGGCTGTGGCAGAGCCCTGTGGGGGCITAC AGTGCCTGC ATGG 4439 

II III I I I II I II I II II II I lllll hum 
4200 cctctgtgatgaagaggaggatctgtttaacccatgccaggcgatcaagtgcaagcatgg 4259 



Db 4440 TCACTGCCAGGCCTCGGCCACCAGAGGGGCACACTGTGTGTGCAGCCCAGGTTTTTCAGG 4499 

Mil I II I III llllll llllll II I Ml 
Qy 4260 gaagtgcaggctttcaggtctggggcagccctactgtgaatgcagcagtggatacacggg 4319 

Db 4500 CGAGCTGIGTGAGCAAGAGTCCGAGTGCCGGGGGGACCCTGTCCGGGACTTTCACCGGGT 4559 

II lllll I III I II II lllll I MM I III 

Qy 4320 ggacagctgtgatcgagaaatctcttgtcgaggggaaaggataagagattattaccaaaa 4379 

Db 4560 CCAGAGGGGCTATGCCATCTGCCAGACCACGCGCCCACTGTCATGGGTGGAATGCCGGGG 4619 

III lllllllll lllll II II llll I I II III I II 

Qy 4380 gcagcagggctatgctgcttgccaaacaaccaagaaggtgtcccgattagagtgcagagg 4439 

Db 4620 CGCGTGCCCGGGCCAGGGCTGCTGCCAGGGCCTGCGGCTGAAGCGGAGGAAGCTCACCTI 4679 

I III I II I lllll III II Mill | | M 

Qy 4440 tgggtgtgcaggagggcagtgctgtggaccgctgaggagcaagcggcggaaatactcttt 4499 

Db 4680 CGAGTGCAGCGATGGGACCTCGITTGCTGAGGAGGTGGAGAAGCCCACCAAGTGCGGCTG 4739 

III llll II II llll llll II lllll Mill 

Qy 4500 cgaatgcactgacggctcctcctttgtggacgaggttgagaaagtggtgaagtgcggctg 4559 

Db 4740 TGCCCCGIGTGCGT 4753 

I I lllll II 
Qy 4560 tacgaggtgtgtgt 4573 



LOCUS AF075240 2553 bp mRNA PRI 09-MAR-1999 

DEFINITION Homo sapiens SLIT1 protein (SLIL2) mRNA, partial cds. 

ACCESSION AF075240 

NID g4377994 

VERSION AF075240.1 GI:4377994 

KEYWORDS 

SOURCE human. 
ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 2553) 

Holmes, G. P., Negus, K,, Burridge,L,, Raman, S., Algar,E., Yamada,T. 
and Little, M.H. 

TITLE Distinct but overlapping expression patterns of two vertebrate slit 
homologs implies functional roles in CNS development and 
organogenesis 

JOURNAL Mech. Dev. 79 (1-2), 57-72 (1998) 
REFERENCE 2 (bases 1 to 2553) 

AUTHORS Holmes, G. P., Negus, K., Raman, S, and Little, M. 

title Direct Submission 

JOURNAL Submitted (29-JCN-1998) Centre for Molecular and Cellular Biology, 
University of Queensland, St. Lucia, Brisbane, Queensland 4072, 
Australia 

FEATURES Location/Qualifiers 
source 1. .2553 

/organism-'Homo sapiens" 

/db_xref""taxon:9606" 

/db_xref="dbEST:T65521" 

/chromosome-^" 

/map= n 5q35" 

/clone- "IMAGE Consortium ID 21651" 
gene <1, ,2553 

/note-'SLITl* 

/gene="SLIL2" 
CDS <1. .2553 

/gene-"SLIL2" 

/note="extracelluar leucine-rich protein; contains 
leucine-rich and EGF-like repeats; similar to Drosophila 
neurogenic extracellular slit protein" 
/codon_start-l 
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/product="SLITl protein" 
/protein_id= " AAD19336 . 1 " 
/db_xref="PID:g4377995" 
/dbjref-'GI: 4377995" 

/translation="EGAFDGAASVQELMLTGNQLETVHGRVFRGLSGLKTLMLRSNLI 
GCVSNDTFAGLSSVRLLSLYDNRITTITPGAFTTLVSLSTINLLSNPFOCHLAWLG 
KWLRKRRIVSGNPRCQKPFFLKEIPIQDVAIQDFTCDGNEESSCQLSPRCPEQCTCME 
TVVRCSNKGLRALPRGMPKDVTELYLEGNHLTAVPRELSALRHLTLIDLSNNSISMLT 
NYTFSNMSHLSTLILSYNRLRCIPVHAFNGLRSLRVLTLHGNDISSVPEGSFNDLTSL 
SHLALGTNPLHCDCSLRWLSEWVKAGYKEPGIARCSSPEPMADRLLLTTPTHRFQCKG 
PVDINIVAKCNACLSSPCKNNGTCTQDPVELYRCACPYSYKGKDCTVPINTCIQNPCQ 
HGGTCHLSDSHKDGFSCSCPLGFEGQRCEINPDDCEDNDCENNATCVDGINNYVCICP 
PNYTGELCDEVIDHCVPELNLCQHEAKCIPLDKGFSCECVPGYSGKLCETDNDDCVAH 
KCRHGAQCVDTINGYTCTCPQGFSGPFCEHPPPMVYDSLSSPPTTVYSVETVNDGQFH 
SVELVTLNQTLNLVVDKGTPKSLGKLQKQPAVGINSPLYLGGIPTSTGLSALRQGTDR 
PLGGFHGCIHEVRINNELQDFKALPPQSLGVSPGCKSCTVCKHGLCRSVEKDSWCEC 
RPGWTGPLCDQEARDPCLGHRCHHGKCVATGTSYMCKCAEGYGGDLCDNKNDSANACS 
AFKCHHGQCHISDQGEPYCLCQPGFSGEHCQQENPCLGQVVREVIRRQKGYASCATAS 
KVPIMECRGGCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS" 

miscjeature 10. .69 

/gene="SLIL2" 
/note="Region: LRR3-2" 

miscjeature 82. .141 

/gene-"SLIL2" 
/note*"Region: LRR3-3" 

miscjeature 154. .213 

/gene="SLIL2" 
/note="Region: LRR3-4" 

miscjeature 226. .261 

/gene="SLlL2" 

/note- "Reg ion: LRR3-5" 

miscjeature 247, .264 

/gene="SLIL2" 
/note="Region: LRR4-4" 

miscjeature 268. .453 

/gene="SLIL2" 

/note="Region: LRR carboxy-flanking region" 
miscjeature 454. .549 

/gene-*SLIL2" 

/note* "Region: LRR amino- flanking region" 
miscjeature 550. .582 

/gene="SLIL2" 

/note*" Region: LRR4-1" 
miscjeature 592. .651 

/gene="SLIL2" 

/note-"Region: LRR4-2" 
miscjeature 664. .723 

/gene="SLIL2" 

/note- "Region: LRR4-3" 
miscjeature 808. .843 

/gene-"SUL2" 

/note=" Region: LRR4-5" 
miscjeature 850. .1023 

/gene="SLlL2" 

/note- "Region: LRR carboxy-flanking region" 
miscjeature 1033. .1140 

/gene-"SLIL2" 

/note-" Region: EGFl" 
miscjeature 1144. ,1263 

/gene="SLIL2" 

/note=" Region: EGF2" 
miscjeature 1267, .1377 

/gene-"SLIL2" 

/note"" Region: EGF3" 
miscjeature 1381, .1497 

/gene="SLlL2" 

/note- "Region: EGF4" 
miscjeature 1501, .1611 

/gene*"SLlL2" 

/note="Region: EGF5" 
miscjeature 1966. .2076 

/gene="SLIL2" 

/note="Region; EGF6A" 



miscjeature 



BASE COUNT 
ORIGIN 



2083. ,2190 
/gene-"SLIL2" 
/note-" Region; EGF7" 
2203, .2313 
/gene-"SLlL2" 
/note-"Region: EGF8" 
2413. .2550 
/gene-"SLIL2" 
/note- n Region; CTC Knot" 
a 798 c 719 g 475 t 



Query Match 10.6%; Score 505; DB 31; Length 2553; 

Best Local Similarity 65.9%; Pred. No. 0.00e+00; 

Matches 1076; Conservative 0; Mismatches 553; Indels 3; Gaps 1; 

Db 1 GAGGGAGCTTTCGATGGAGCAGCCAGCGTGCAGGAGCTGATGCTGACAGGGAACCAGCTG 60 

llllllll II II llllll I I II I II I I II II I II I II 
Qy 1711 gagggagcatttgaaggagcatctggtgtaaatgaaatacttcttacgagtaatcgtttg 1770 

Db 61 GAGACCGTGCACGGGCGCGTGTTCCGTGGCCTCAGTGGCCTCAAAACCTTGATGCTGAGG 120 

II I Mill Mill II. I II 1 1 II I II I llllll MM 

Qy 1771 gaaaatgtgcagcataagatgttcaagggattggaaagcctcaaaactttgatgttgaga 1830 



II II II lllllll I IMIIII II II II Mill III I II 
Qy 1831 agcaatcgaataacctgtgtggggaatgacagtttcataggactcagttctgtgcgtttg 1890 

Db 181 CTGTCCCTCTATGACAATCGGATCACCACCATCACCCCTGGGGCCTTCACCACGCTTGTC 240 

II II I Mill MM II II II I I II Mill II II II 
Qy 1891 ctttctttgtatgataatcaaattactacagttgcaccaggggcatttgatactctccat 1950 

Db 241 TCCCTGTCCACCATAAACCTCCTGTCCAACCCCTTCAACTGCAACTGCCACCTGGCCTGG 300 

II I II II llllllll II MM II II Mill llllll lllllll Ml 
Qy 1951 tctttatctactctaaacctcttggccaatccttttaactgtaactgctacctggcttgg 2010 

Db 301 CTCGGCAAGTGGTTGAGGAAGAGGCGGATCGTCAGTGGGAACCCIAGGTGCCAGAAGCCA 360 

I II Mill MM MM I I II MM M II Mill II M M III 
Qy 2011 ttgggagagtggctgagaaagaagagaattgtcacgggaaatcctagatgtcaaaaacca 2070 

Db 361 TTTTTCCTCAAGGAGATTCCCATCCAGGATGTGGCCATCCAGGACTTCACCTGTGATGGC 420 

I Mill II II II I ' I ' 1 1 1 1 1 1 Ml! MIMIIMII lllllll I 

Qy 2071 tacttcctgaaagaaatacccatccaggatgtggccattcaggacttcacttgtgatgac 2130 

Db 421 --AACGAGGAGAGTAGCTGCCAGCTGAGCCCGCGCTGCCCGGAGCAGTGCACCTGTATG 477 

II M II I III III I I Mill II I M II II II 
Qy 2131 ggaaatgatgacaatagttgctccccactttctcgctgtcctactgaatgtacttgcttg 2190 

Db 478 GAGACAGTGGTGCGATGCAGCAACAAGGGGCTCCGCGCCCTCCCCAGAGGCATGCCCAAG 537 

II Mill II Mill IMIMIMM I I I I M I III II II I 

Qy 2191 gatacagtcgtccgatgtagcaacaagggtttgaaggtcttgccgaaaggtattccaaga 2250 

Db 538 GATGTGACCGAGCTGTACCIGGAAGGAAACCACCTAACAGCCGTGCCCAGAGAGCTGTCC 597 

Mill II III MM Mill MMMM I III II MM II II Ml 
Qy 2251 gatgtcacagagttgtatctggatggaaaccaatttacactggttcccaaggaactctcc 2310 

Db 598 GCCCTCCGACACCTGACGCTTATTGACCTGAGCAACAACAGCATCAGCATGCTGACCAAT 657 

I I III I M Mill III I M llllllll M Mil III I Ml 
Qy 2311 aactacaaacatttaacacttatagacttaagtaacaacagaataagcacgctttctaat 2370 

Db 658 TACACCTICAGTAACATGTCTCACCTCTCCACTCTGATCCTGAGCTACAACCGGCTGAGG 717 

I I llllll llllll I II III III I II M II llllllll Mill 
Qy 2371 cagagcttcagcaacatgacccagctcctcaccttaattcttagttacaaccgtctgaga 2430 

Db 718 TGCATCCCCGTCCACGCCTTCAACGGGCTGCGGTCCCTGCGAGTGCTAACCCTCCATGGC 777 

II II II II MM I II I III II III III I II Mill 
Qy 2431 tgtattcctcctcgcacctttgatggattaaagtctcttcgattactttctctacatgga 2490 

Db 778 AATGACATTTCCAGCGTTCCTGAAGGCTCCTTCAACGACCICACATCICTTTCCCATCIG 837 

milium m iiinni i inn n n i i i n inn 

Qy 2491 aatgacatttctgttgtgcctgaaggtgctttcaatgatctttctgcattatcacatcta 2550 
Db 838 GCGCTGGGAACCAACCCACICCACTG1GACTGCAGICTTCGGTGGCIGTCGGAGTGGGTG 897 
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Qy 26! 

Db 9! 

Qy 26; 

Db io: 

Qy 27: 

Db io; 

Qy 27' 



DT ir 

Qy 29: 

Db 12! 

Qy 29' 

Db 13: 

Qy 30: 

Db 13' 

Qy 30' 

Db 14: 

Qy 31! 

Db 14' 

Qy 32: 



Qy 33: 



1 1 I III I llllll II 1 1 II I II IN Ml 

2551 gcaattggagccaaccctctttactgtgattgtaacatgcagtggttatccgactgggtg 2610 



AAGGCGGGGTACAAGGAGCCTGGCATCGCCCGCTGCAGTAGCCCTGAGCCCATGGCTGAC 957 

m in ii minimi n n n n m mi urn n 



1 aagtcggaatataaggagcctggaattgctcgttgtgctggtcctggagaaatggcagat 2670 

8 AGGCTCCTGCTCACCACCCCAACCCACCGCTTCCAGTGCAAAGGGCCAGTGGACATCAAC 1017 
I II I Mill II II II I II II Mil II INN MM 

I aaacttttactcacaactccctccaaaaaatttacctgtcaaggtcctgtggatgtcaat 2730 

.8 ATTGTGGCCAAATGCAATGCCTGCCTCTCCAGCCCGTGCAAGAATAACGGGACATGCACC 1077 

in i ii ii ii ii mini ii i inn ii iii i ii inn I 

attctagctaagtgtaacccctgcctatcaaatccgtgtaaaaatgatggcacatgtaat 2790 
'8 CAGGACCCTGTGGAGCTGTACCGCTGTGCCTGCCCCTACAGCTACAAGGGCAAGGACTGC 1137 

n ii ii ii i inn ii mi ii n ii iiiiii limn 

II agtgatccagttgacttttaccgatgcacctgtccatatggtttcaaggggcaggactgt 2850 
18 ACTGTGCCCATCAACACCTGCATCCAGAACCCCTGTCAGCATGGAGGCACCTGCCACCTG 1197 

in n n i n inn iii i innni ii inin i 

il gatgtcccaattcatgcctgcatcagtaacccatgtaaacatggaggaacttgccactta 2910 

18 AGTGACAGCCACAAGGATGGGTTCAGCTGCTCCTGCCCTCTGGGCTTTGAGGGGCAGCGG 1257 

I II I I I Ill I II || || || HIM || | 

1 aaggaaggagaagaagatggattctggtgtatttgtgctgatggatttgaaggagaaaat 2970 

8 TGTGAGATCAACCCAGATGACTGTGAGGACAACGACTGCGAAMCAATGCCACCTGCGTG 1317 

inn inn inn inn n n inn mn in i n n n 

1 tgtgaagtcaacgttgatgattgtgaagataatgactgtgaaaataattctacatgtgtc 3030 
.8 GACGGGATCAACAACTACGTGTGTATCTGTCCGCCTAACTACACAGGTGAGCTATGCGAC 1377 

i n n ii nun ii i ii ii in i ii innini i n n 

il gatggcattaataactacacatgcctttgcccacctgagtatacaggtgagttgtgtgag 3090 
'8 GAGGTGATIGACCACTGTGTGCCTGAGCTGAACCTCTGTCAGCATGAGGCCAAGTGCATC 1437 

m i i in mn i ii mini m inn n i miinii 

gagaagctggacttctgtgcccaggacctgaacccctgccagcacgattcaaagtgcatc 3150 
18 CCCCTGGACAAAGGATTCAGCTGCGAGTGTGTCCCTGGCTACAGCGGGAAGCTCTGTGAG 1497 

i n mini n ii ii ii n i n i i in n 

ctaactccaaagggattcaaatgtgactgcacaccagggtacgtaggtgaacactgcgac 3210 
18 ACAGACAATGATGACTGTGTGGCCCACAAGTGCCGCCACGGGGCCCAGTGCGTGGACACA 1557 

n m mn 1 1 limn mi mn i n 

.1 atcgattttgacgactgccaagacaacaagtgtaaaaacggagcccactgcacagatgca 3270 
i8 ATCAATGGCTACACATGCACCTGCCCCCAGGGCTTCAGTGGACCCTTCTGTGAACACCCC 1617 

i n mn ii mi mm i n i mm iiimii i 

' 1 gtgaacggctatacgtgcatatgccccgaaggttacagtggcttgttctgtgagttttct 3330 
.8 CCACCCATGGTC 1629 

minimi 

H ccacccatggtc 3342 



RESULT 9 
LOCUS 

DEFINITION 
ACCESSION 
NID 

VERSION 
KEYWORDS 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



AB011538 6921 bp itiRNA PRI 22-AUG-1998 

Homo sapiens mRNA for MEGF5 , partial cds. 

AB011538 

g3449301 

AB011538.1 GI: 3449301 
MEGF5. 

Homo sapiens male brain cDNA to mRNA, clone_lib : pBluescriptll SK 
plus clone :HG2635. 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 

Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 6921) 

Nakayama, M., Nakajima,D. and Qhara,0, 

Direct Submission 

Submitted (26-FEB-1998) to the DDBJ/EMBL/GenBank databases. Manabu 
Nakayama, Kazusa DNA Research Institute, Laboratory of DMA 



TITLE 

JOURNAL 
MEDLINE 
FEATURES 
source 



polyA_site 

BASE COUNT 1820 
ORIGIN 



technology; 1532-3, Yana, Kisarazu, Chiba 292-0812, Japan 
( E-mail : nmanabuSkazusa . or.jp, Tel:+81-438-52-3915, 
Fax:+81-438-52-3914) 
2 (sites) 

Nakayama, M., Nakajima,D,, Nagase,T., Nomura, N., Seki,N. and 
Ohara,0, 

Identification of high-molecular -weight proteins with multiple 
EGF-like motifs by motif-trap screening 
Genomics 51 (1), 27-34 (1998) 



Location/Qualifiers 
1. .6921 

/organism-'Homo sapiens" 
/db_xref- n taxon;9606" 
/chromosome" "5" 
/clone="HG2635" 

/cloneJib-'pBluescriptu SK plus" 

/map- n 5q35" 

/sex- "male" 

/tissue_type= "brain" 

1. .2221 

/gene="MEGF5" 

<1. .2221 

/gene="MEGF5" 

/note- "human homologue of Drosophila slit protein; human 
slit2" 

/codon_start»2 
/product" "MEGF5" 
/protein_id="BAA32466,l" 
/dbjcref="PID:dl033429" 
/db_xref="PID:g3449302" 
/db_xref-"GI: 3449302" 

/translation 0 "NSISMLTNYTFSNMSHLSTLILSYNRLRCIPVHAFNGLRSLRVL 
TLHGNDISSVPEGSFNDLTSLSHLALGTNPLHCDCSLRWLSEWVKAGYKEPGIARCSS 
PEPMADRLLLTTPTHRFQCKGPVDINIVAKCNACLSSPCKNNGTCTQDPVELYRCACP 
YSYKGKDCTVPINTCIQNPCQHGGTCHLSDSHKDGFSCSCPLGFEGQRCEINPDDCED 
NDCENNATCVDGINNYVCICPPNYTGELCDEVIDHCVPELNLCQHEAKCIPLDKGFSC 
ECVPGYSGKLCETDNDDCVAHKCRHGAQCVDTINGYTCTCPQGFSGPFCEHPPPMVLL 
QTSPCDQYECQNGAQCIWQQEPTCRCPPGFAGPRCEKLITVNFVGKDSYVELASAKV 
RPQANISLQVATDKDNGILLYKGDNDPLALELYQGHVRLVYDSLSSPPTTVYSVETVN 
DGQFHSVELVTLNQTLNLWDKGTPKSLGKLQKQPAVGINSPLYLGGIPTSTGLSALR 
QGTDRPLGGFHGC I HEVRI NNELQDFKALP PQSLGVS PGC K SCT VCKHGLCRSVEKDS 
WCECRPGWTGPLCDQEARDPCLGHRCHHGKCVATGTSYMCKCAEGYGGDLCDNKNDS 
ANACSAFKCHHGQCHISDQGEPYCLCQPGFSGEHCQQENPCLGQWREVIRRQKGYAS 
CATASKVPIMECRGGCGPQCCQPTRSKRRKYVFQCTDGSSFVEEVERHLECGCLACS" 
6921 

/note- "19 a nucleotides" 
a 1813 c 1708 g 1580 t 



Query Match 9,94; Score 469; DB 29; Length 6921; 

Best Local Similarity 61,64; Pred, No, 0.00e+00; 

Matches 1364; Conservative 0; Mismatches 841; Indels 9; 



7; 



Db 



1 CAACAGCATCAGCATGCTGACCAATTACACCTTCAGTAACATGTCTCACCTCTCCACTCT 60 

inin n mi in i m i i mm mm i n in in i 

Qy 2346 caacagaataagcacgctttctaatcagagcttcagcaacatgacccagctcctcacctt 2405 



Db 



61 GATCCTGAGCTACAACCGGCTGAGGTGCATCCCCGTCCACGCCTTCAACGGGCTGCGGTC 120 

ii ii ii inimi inn n n n 1 1 mi i n i 

2406 aattcttagttacaaccgtctgagatgtattcctcctcgcacctttgatggattaaagtc 2465 
121 CCTGCGAGTGCTAACCCTCCATGGCAATGACATTTCCAGCGTTCCTGAAGGCTCCTTCAA 180 

n in iii in inn milium n mimi i mn 

2466 tcttcgattactttctctacatggaaatgacatttctgttgtgcctgaaggtgctttcaa 2525 
181 CGACCTCACATCTCTTTCCCATCTGGCGCTGGGAACCAACCCACTCCACTGTGACTGCAG 240 

ii ii i i i ii mn ii i iii mini n iinni n i 

2526 tgatctttctgcattatcacatctagcaattggagccaaccctctttactgtgattgtaa 2585 



Db 



241 TCTTCGGTGGCTGTCGGAGTGGGTGAAGGCGGGGTACAAGGAGCCTGGCATCGCCCGCTG 300 

i i mi i n ii innini iii ii minimi n n n n 

Qy 2586 catgcagtggttatccgactgggtgaagtcggaatataaggagcctggaattgctcgttg 2645 
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Db 301 CAGTAGCCCTGAGCCCATGGCTGACAGGCTCCTGCTCACCACCCCAACCCACCGCTTCCA 360 

i i mi inn ii mi i inn n n n i n 

Qy 2646 tgctggtcctggagaaatggcagataaacttttactcacaactccctccaaaaaatttac 2705 

Db 361 GTGCAAAGGGCCAGTGGACATCAACATTGTGGCCAAATGCAATGCCTGCCTCTCCAGCCC 420 

ii mi n inn mi in i n n n n iinin n i n 

Qy 2706 ctgtcaaggtcctgtggatgtcaatattctagctaagtgtaacccctgcctatcaaatcc 2765 

Db 421 GTGCAAGAATAACGGGACATGCACCCAGGACCCTGTGGAGCTGTACCGCTGTGCCTGCCC 480 

in ii iii i ii inn i ii n n n i inn n mi n 

Qy 2766 gtgtaaaaatgatggcacatgtaatagtgatccagttgacttttaccgatgcacctgtcc 2825 

Db 481 CTACAGCTACAAGGGCAAGGACTGCACTGTGCCCATCAACACCTGCATCCAGAACCCCTG 540 

ii 1 1 mm mini in n n i iiiimi inn n 

Qy 2826 atatggtttcaaggggcaggactgtgatgtcccaattcatgcctgcatcagtaacccatg 2885 

•541 TCAGCATGGAGGCACCTGCCACCTGAGTGACAGCCACAAGGATGGGTTCAGCTGCTCCTG 600 
I I iiiimi II III! I I II I I I Hill III I II II 

Qy 2886 taaacatggaggaacttgccacttaaaggaaggagaagaagatggattctggtgtatttg 2945 

Db 601 CCCTCTGGGCTTTGAGGGGCAGCGGTGTGAGATCAACCCAGATGACTGTGAGGACAACGA 660 

ii n inn ii i inn inn inn mi n n n 

Qy 2946 tgctgatggatttgaaggagaaaattgtgaagtcaacgttgatgattgtgaagataatga 3005 

Db 661 CTGCGAAAACAATGCCACCTGCGTGGACGGGATCAACAACTACGTGTGTATCTGTCCGCC 720 

iii inn iii i ii ii ii ii ii ii ii inni ii i n n n 

Qy 3006 ctgtgaaaataattctacatgtgtcgatggcattaataactacacatgcctttgcccacc 3065 

Db 721 TAACTACACAGGTGAGCTATGCGACGAGGTGATTGACCACTGTGTGCCTGAGCTGAACCT 780 

i i ii mmni i ii ii iii i i iii inn i n mini 

Qy 3066 tgagtatacaggtgagttgtgtgaggagaagctggacttctgtgcccaggacctgaaccc 3125 

Db 781 CTGTCAGCATGAGGCCAAGTGCATCCCCCTGGACAAAGGATTCAGCTGCGAGTGTGTCCC 840 

iii inn ii i iiiimin ii mini n n n n 

Qy 3126 ctgccagcacgattcaaagtgcatcctaactccaaagggattcaaatgtgactgcacacc 3185 

Db 841 TGGCTACAGCGGGAAGCTCTGTGAGACAGACAATGATGACTGTGTGGCCCACAAGTGCCG 900 

II III II I I III II I II III lllll I I 1 1 1 11 M 

Qy 3186 agggtacgtaggtgaacactgcgacatcgattttgacgactgccaagacaacaagtgtaa 3245 

Db 901 CCACGGGGCCCAGTGCGTGGACACAATCAATGGCTACACATGCACCTGCCCCCAGGGCTT 960 

mi iiiii iii ii ii i ii mn ii mi mm i n i 

Qy 3246 aaacggagcccactgcacagatgcagtgaacggctatacgtgcatatgccccgaaggtta 3305 

•961 CAGTGGACCCTTCTGTGAACACCCCCCACCCATGGTCCTACTGCAGACCAGCCCATGCGA 1020 
mill inni' i iiniiiiiiim i i nun n n 

Qy 3306 cagtggcttgttctgtgagttttctccacccatggtcctccctcgtaccagcccctgtga 3365 

Db 1021 CCAGTACGAGTGCCAGAACGGGGCCCAGTGCATCGTGGTGCAGCAGGAGCCCACCTGCCG 1080 

II II II lllll II II lllll lllll I lllll I II I 

Qy 3366 taattttgattgtcagaatggagctcagtgtatcgtcagaataaatgagccaatatgtca 3425 

Db 1081 CTGCCCACCAGGCTTCGCCGGCCCCAGATGCGAGAAGCTCATCACTGTCAACTTCGTGGG 1140 

II II llll II I II II II I I I III II II I 

Qy 3426 gtgtttgcctggctatcagggagaaaagtgtgaaaaattggttagtgtgaattttataaa 3485 

Db 1141 CAAAGACTCCTACGTGGAACTGGCCTCCGCCAAGGTCCGACCCCAGGCCAACATCTCCCI 1200 

mm ii in i i i ii Mih ii ii iii i iiiii i ii 

Qy 3486 caaagagtcttatcttcagattccttcagccaaggttcggcctcagacgaacataacact 3545 

Db 1201 GCAGGTGGCCACTGACAAGGACAACGGCATCCTTCTCTACAAAGGAGACAATGACCCCCT 1260 

III I lllll II I llll III lllll II II II II lllll llll I 

Qy 3546 tcagattgccacagatgaagacagcggaatcctcctgtataagggtgacaaagaccatat 3605 

Db 1261 GGCACTGGAGCTGTACCAGGGCCACGIGCGGCTGGTCTATGACAGCCTGAGTTCCCCTCC 1320 

II I II II II I III I II II 1 1 1 1 M 1 1 I llll 

Qy 3606 cgcggtagaactctatcgggggcgtgttcgtgccagctatgacaccggctctcatccagc 3665 

Db 1321 AACCACAGTGTACAGTGTGGAGACAGTGAATGATGGGCAGTTTCACAGTGTGGAGCTGGT 1380 

i i i iniimniim i mih: i ii iiii mm n i 

Qy 3666 ttctgccatttacagtgtggagacaatcaatgatggaaacttccacattgtggaactact 3725 



Db 1381 GACGCTAAACCAGACCCTGAACCTAGTAGTGGACAAAGGAACTCCAAAGAGCCTGGGGAA 1440 

I I I llll II I lllll IN II II I I I II 
Qy 3726 tgccttggatcagagtctctctttgtccgtggatggtgggaaccccaaaatcatcactaa 3785 

Db 1441 GCTCCAGAAGCAGCCAGCAGTGGGCATCAACAGCCCCCTCTACCTTGGAGGCATCCCCAC 1500 

i mill i i ii i ii ii iiiii i iinnn ii 

Qy 3786 cttgtcaaagcagtccactctgaattttgactctccactctatgtaggaggcatgccagg 3845 
Db 1501 CTCCACCGGCCTCTCCGCCTTGCGCCAGGGCACGGACCGGCCTCTAGGCGGCTTCCACGG 1560 

i 1 1 ii mi: iiiii i i minim 

Qy 3846 gaagagtaacgtggcatctctgcgccaggcccctgggcagaacggaaccagcttccacgg 3905 
Db 1561 ATGCATCCATGAGGTGCGCATCAACAACGAGCTGCAGGACTTCAAGGCCCTCCCACCACA 1620 

mini i i mum Minimum n i n n 

Qy 3906 ctgcatccggaacctttacatcaacagtgagctgcaggacttccagaaggtgccgatgca 3965 
Db 1621 GTCCCTGGGGGTGTCAC-CAGGCTGCAA-GTC-CTGCACCGTGTGCAAGCACGGCCTGTG 1677 

i ii i i i ii i i i i i iiiii ii in ii 

Qy. 3966 aacaggcattttgcctggctgtgagccatgccacaagaaggtgtgtgcccatggcacatg 4025 

Db 1678 CCGCTCCGTGGAGAAGGACAGCGTGGTGTGCGAGTGCCGCCCAGGCTGGACCGGCCCACT 1737 

II II III II I llllllllll III llll II II II 

Qy 4026 ccagcccagcagccaggcaggcttcacctgcgagtgccaggaaggatggatggggcccct 4085 

Db 1738 CTGCGACCAGGAGGCCCGGGACCCCTGCCTCGGCCACAGATGCCACCATGGAAAATGTGT 1797 

III MM I II lllll lllll II I I llll lllll I II I 

Qy 4086 ctgtgaccaacggaccaatgacccttgccttggaaataaatgcgtacatggcacctgctt 4145 

Db 1798 GGCAA-C - -TGGGACCTCATACATGTGCAAGTGTGCCGAGGGCTATGGAGGGGACTTGTG 1854 

iiii n i in iiii ii mn nun i mm i i mi 

Qy 4146 gcccatcaatgcgttctcctacagctgtaagtgcttggagggccatggaggtgtcctctg 4205 

Db 1855 TGACAACAAGAATGACTCTGCCAATGCCTGCTCAGCCTTCAAGTGTCACCATGGGCAGTG 1914 

III I II I II II I III II Mllll I MM Ml 
Qy 4206 tgatgaagaggaggatctgtttaacccatgccaggcgatcaagtgcaagcatgggaagtg 4265 

Db 1915 CCACATCTCAGACCAAGGGGAGCCCTACTGCCIGTGCCAGCCCGGCTTTAGCGGCGAGCA 1974 

I I llll I III llllllllll III II I I II II 

Qy 4266 caggctttcaggtctggggcagccctactgtgaatgcagcagtggatacacgggggacag 4325 

Db 1975 CTGCCAACAAGAGAATCCGTGCCTGGGACAAGTAGTCCGAGAGGTGATCCGCCGCCAGAA 2034 

III I I III I I II I II II I llll II III I 
Qy 4326 ctgtgatcgagaaatctcttgtcgaggggaaaggataagagattattaccaaaagcagca 4385 

Db 2035 AGGTTATGCATCATGTGCCACAGCCTCCAAGGTGCCCATCATGGAATGTCGTGGGGGCTG 2094 

ii iiiii i ii iii ii mm ii i ii ii i ii ii ii 

Qy 4386 gggctatgctgcttgccaaacaaccaagaaggtgtcccgattagagtgcagaggtgggtg 4445 

Db 2095 TG - -GGCCC - CAGTGCTGCCAGCCCACCCGCAGCAAGCGGCGGAAATACGTCTTCCAGTG 2151 

II II llllllll II I llllllllllllinill III I II 

Qy 4446 tgcaggagggcagtgctgtggaccgctgaggagcaagcggcggaaatactctttcgaatg 4505 

Db 2152 CACGGACGGCTCCTCGTTTGTAGAAGAGGTGGAGAGACACTTAGAGTGCGGCTG 2205 

in minimi mn n mn mi i i mm i 

Qy 4506 cactgacggctcctcctttgtggacgaggttgagaaagtggtgaagtgcggctg 4559 



RESULT 10 

LOCUS AB017170 591 bp mRNA ROD 06-FEB-1999 

DEFINITION Rattus norvegicus mRNA for Slit-1 protein, partial cds, 

ACCESSION AB017170 

NID ' g4049590 

VERSION AB017170.1 61:4049590 

KEYWORDS slit-1; Slit-1 protein. 

SOURCE Rattus norvegicus tissue_lib:brain cDNA to mRNA. 

ORGANISM Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Rodentia; Sciurognathi; Muridae; Murinae; Rattus, 
REFERENCE 1 (bases 1 to 591) 

AUTHORS Itoh,A. and Sakano,S. 

TITLE Direct Submission 

JOURNAL Submitted (27-AUG-1998) to the DDBJ/EMBL/GenBank databases. Akira 
Itoh, Asahi Chemical Industry co . , ltd . , Life Science Fundamental 
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REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
FEATURES 
source 



BASE CODNT 
ORIGIN 



Research Laboratory; 2-1, Samej iraa, Fuji, Shizuoka 416-8501, Japan 
(E-mail:a86114B3«ut. asahi-kasei.co.jp, Teh+81-545-62-3231, . 
Fax:+81-545-62-3249) 
2 (sites) 

Itoh,A., Miyabayashi,T., Ohno,M, and Sakano,S. 

Cloning and expressions of three mammalian homologues of Drosophila 

slit suggest possible roles for Slit in the formation and 

maintenance of the nervous system 

Brain Res. Mol. Brain Res. 62 (2), 175-186 (1998) 

99033071 

Location/Qualifiers 

1. .591 ■ 

/organism="Rattus norvegicus" 
/db_xref»"taxon: 10116" 
/tissue_lib a "brain" 

2. .589 

/gene-"slit-l" 

<2. ,>589 

/gene-"slit-l" 

/codon_start=l 

/product="Slit-l protein" 

/protein_id-"BAA35187.1" 

/db_xref="PID:dl036173" 

/db_xref-"PiD:g4049591" 

/db jtref- "GI : 4 04 9591 " 

/translation- " IAVELYQGHVRVSYDPGSYPSSAIYSAETINDGQFHTVELVTFD 
QMVNLSIDGGSPMTMDNFGKHYTLNSEAPLYVGGMPVDVNSAAFRLWQILNGTSFHGC 
IRNLYINNELQDFTKTQMKPGWPGCEPCRKLYCLHGICQPNATPGPVCHCEAGWGGL 
HCDQPVDGPCHGHKCVHGKCVPLDALAYSCQCQDGY" 
129 a 185 c 161 g 116 t 



Query Match 3,8*; Score 182; DB 32; Length 591; 

Best Local Similarity 65.64; Pred. No. 1.15e-110; 

Matches 383; Conservative 0; Mismatches 201; Indels 0; Gaps 

Db 2 ATTGCAGTTGAGCTGTACCAGGGCCATGTCCGTGTTAGCTACGACCCAGGCAGCTACCCC 61 

II II II I! II II I III I III III! Hill III I III' | || 
Qy 3604 atcgcggtagaactctatcgggggcgtgttcgtgccagctatgacaccggctctcatcca 366! 

Db 62 AGCTCTGCTATCTACAGTGCTGAAACAATCAACGATGGGCAGTTCCACACAGTTGAGCTG 121 

inn ii iiiiiii ii iiiinii inn i Minn n n i 

Oy 3664 gcttctgccatttacagtgtggagacaatcaatgatggaaacttccacattgtggaacta 372: 
Db 122 GTGACCTTTGACCAGATGGTGAACCTCTCCATCGATGGTGGCAGCCCCATGACCATGGAC 181 

i mi n mi i i iii i mini i inn i in 

Qy 3724 cttgccttggatcagagtctctctttgtccgtggatggtgggaaccccaaaatcatcact 378: 
Db 182 AACTTTGGAAAGCACTACACACTCAACAGTGAGGCCCCCCTCTATGTGGGAGGGATGCCC 241 

• him mill i iii ii ii iii i ii iimin inn urn 
3784 aacttgtcaaagcagtccactctgaattttgactctccactctatgtaggaggcatgcca 384 

Db 242 GTGGATGTGAACTCAGCTGCCTTCCGCCTGTGGCAGATCCTCAATGGCACCAGCTTCCAC 301 
I I I III II I I llll II Ml II 1 1 1 1 1 1 1 1 fl 1 1 

Qy 3844 gggaagagtaacgtggcatctctgcgccaggcccctgggcagaacggaaccagcttccac 390: 

Db 302 GGTTGCATCCGGAATCTATACATCAACAACGAACTGCAGGACTTCACCAAGACACAGATG 361 

ii Minimi ii minim n inmniiii m i mi 

Qy 3904 ggctgcatc'cggaacctttacatcaacagtgagctgcaggacttccagaaggtgccgatg 396: 
Db 362 AAGCCGGGCGTGGTGCCCGGCTGCGAGCCCTGCCGAAAACTCTACTGTCTACATGGCATT 421 

i i in i mi inn inn mi n m mini 

Qy 3964 caaacaggcattttgcctggctgtgagccatgccacaagaaggtgtgtgcccatggcaca 402: 

Db 422 TGCCAGCCCAACGCCACCCCAGGGCCCGTGTGCCACTGCGAGGCTGGCTGGGGGGGCCTG 481 

MINIMI! I I llll I III I III III II III III I 
Qy 4024 tgccagcccagcagccaggcaggcttcacctgcgagtgccaggaaggatggatggggccc 

Db 482 CACTGTGACCAGCCAGTGGACGGCCCCTGCCATGGCCACAAGTGTGTCCATGGGAAATGC 541 

I IIMMIII I I I III Mil III I II II II lllll I III 
Qy 4084 ctctgtgaccaacggaccaatgacccttgccttggaaataaatgcgtacatggcacctgc 414 

Db 542 GTGCCCCTCGATGCACTTGCCTACAGCTGCCAGTGCCAGGATGG 585 



inn n iiii i minim urn in n 

Qy 4144 ttgcccatcaatgcgttctcctacagctgtaagtgcttggaggg 4187 



RESULT 11 

LOCUS DMSLIT 5401 bp mRNA INV 14-FEB-1991 

DEFINITION Drosophila mRNA for slit protein, 

ACCESSION X53959 

NID g8614 

VERSION X53959.1 GI;8614 

KEYWORDS extracellular protein; morphogenesis protein; slit gene. 
SOURCE fruit fly. 
ORGANISM Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 
Pterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; 
Drosophilidae; Drosophila. 
REFERENCE 1 (bases 1 to 5401) 
AUTHORS Rothberg, J.M. 
TITLE Direct Submission 

JOURNAL Submitted (18-JUL-1990) J.M. Rothberg, YALE UNIVERSITY, DEPT. OF 

BIOLOGY, P.O.BOX 6666, NEW HAVEN, CT 06520, USA 
REFERENCE 2 (bases 1 to 5401) 
AUTHORS Rothberg, J. M., Jacobs, J. R. , Goodman, C.S. and Artavanis-Tsakonas,S. 
TITLE slit; an extracellular protein necessary for development of midline 

glia and commissural axon pathways contains both EGF and LRR ' 

domains 

JOURNAL Genes Dev. 4 (12A), 2169-2187 (1990) 
MEDLINE 91099565 
FEATURES Location/Qualif iers 

source 1. .5401 

/organism* "Drosophila melanogaster ■ 
/db_xref-"taxon:7227" 
/dev.stage-'embryo" 
mRNA 1. .5401 

/gene-"sli n 

/db_xref-"FlyBase:FBgn0003425" 
/evidence-experimental 
gene 1, .5401 

/note-"slit" 
/gene-"sli" 
/allele-"" 

/db.xref- " FlyBase : FBgn0003425 ■ 
sig_peptide 251. .358 
/gene-°sli" 

/db_xre f - " FlyBase : FBgn 0003425" 
/product- "slit protein" 
CDS 251. .4693 

/gene-"sli" 
/codon_start-l 
/product-"slit protein" 
/protein_id-"CAA37910.1" 
/db_xref-"PID;g8615" 
/db_xref-"GI:8615" 
/db_xref-" FLYBASE :FBgn00034 25" 
/db_xref- " SWI SS - PROT : P24 014 " 

/translation""MAAPSRTTLMPPPFRLQLRLLILPILLLLRHDAVHAEPYSGGFG 
SSAVSSGGLGSVGIHIPGGGVGVITEARCPRVCSCTGLNVDCSHRGLTSVPRKISADV 
ERLELQGNNLTVIYETDFQRLTKLRMLQLTDNQIHTIERNSFQDLVSLERLDISNNVI 
TTVGRRVFKGAQSLRSLQLDNNQITCLDEHAFKGLVELEILTLNNNNLTSLPHNIFGG 
LGRLRALRLSDNPFACDCHLSWLSRFLRSATRLAPYTRCQSPSQLKGONVADLHDQEF 
KCSGLTEHAPMECGAENSCPHPCRCADGIVDCREKSLTSVPVTLPDDTTDVRLEQNFI 
TELPPKSFSSFRRLRRIDLSNNNISRIAHDALSGLKQLTTLVLYGNKIKDLPSGVFKG 
LGSLRLLLLNANEISCIRKDAFRDLHSLSLLSLYDNNIQSLANGTFDAMKSMKTVHLA 
KNPFICDCNLRWLADYLHKNPIETSGARCESPKRMHRRRIESLREEKFKCSWGELRMK 
LSGECRMDSDCPAMCHCEGTTVDCTGRRLKEIPRDIPLHTTELLLNDNELGRISSDGL 
FGRLPHLVKLELKRNQLTGIEPNAFEGASHIQELQLGENKIKEISNKMFLGLHQLKTL 
NLYDNQISCVMPGSFEHLNSLTSLNLASNPFNCNCHLAWFAECVRKKSLNGGAARCGA 
PSKVRDVQIKDLPHSEFKCSSENSEGCLGDGYCPPSCTCTGTWACSRNQLKEIPRGI 
PAETSELYLESNEIEQIHYERIRHLRSLTRLDLSNNQITILSNYTFANLTKLSTLIIS 
YNKLQCLQRHALSGLNNLRWSLHGNRISMLPEGSFEDLKSLTHIALGSNPLYCDCGL 
KWFSDWIKLDYVEPGIARCAEPEQMKDKLILSTPSSSFVCRGRVRNDILAKCNACFEQ 
PCQNQAQCVALPQREYQCLCQPGYHGKHCEFMIDACYGNPCRNNATCTVLEEGRFSCQ 
CAPGYTGARCETNIDDCLGEIKCQNNATCIDGVESYKCECQPGFSGEFCDTKIQFCSP 
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EFNPCANGAKCMDHFTHYSCDCQAGFHGTNCTDNIDDCQNHMCQNGGTCVDGINDYQC 
RCPDDYTGKYCEGHNMISMMYPQTSPCQNHECKHGVCFQPNAQGSDYLCRCHPGYTGK 
WCEYLTSISFVHNNSFVELEPLRTRPEANVTIVFSSAEQNGILMYDGQDAHLAVELFN 
GRIRVSYDVGNHPVSTMYSFEMVADGKYHAVELLAIKKNFTLRVDRGLARSIINEGSN 
DYLKLTTPMFLGGLPVDPAQQAYKNWQIRNLTSFKGCMKEVWINHKLVDFGNAQRQQK 
ITPGCALLEGEQQEEEDDEQDFMDETPHIKEEPVDPCLENKCRRGSRCVPNSNARDGY 
QCKCKHGQRGRYCDQGEGSTEPPTVTAASTCRKEQVREYYTENDCRSRQPLKYAKCVG 
GCGNQCCAAKIVRRRKVRMVCSNNRKYIKNLDIVRKCGCTKKCY" 
mat_peptide 359, ,4690 
/gene- "si i" 

/db_xref » "FlyBase ; FBgn0003425 n 
/product- "slit protein" 
misc.feature 4495, ,4527 
/gene* "si i" 

/note* "alternatively spliced segment" 
/db_xref" "FlyBase : FBgn00034 25 n 
1504 c 1528 g 1056 t 



iE COUNT 1313 a 
[GIN 



Query Match 2.2%; 
Best Local Similarity 59.5*; 
Matches 331; Conservative 



Score 106; DB 21; Length 5401; 
Pred. No. 2,lle-52; 
0; Mismatches 225; Indels 0; 



0; 



Db 1132 CTGTCCGCACCCATGTCGCTGTGCGGACGGGATCGTCGATTGCCGTGAGAAGAGTCTGAC 1191 

iii ii i i iii mi ii inn ii ii mi in nn n 

Qy 828 ctgccctgccgcctgtacctgtagcaacaatatcgtagactgtcgtgggaaaggtctcac 887 

Db 1192 CAGCGTGCCCGTCACCTTGCCCGACGACACCACCGACGTTCGCCTCGAGCAAAATITCAT 1251 

I III I I II II II III III III II II II II 
Qy 888 tgagatccccacaaatcttccagagaccatcacagaaatacgtttggaacagaacacaat 947 

Db 1252 TACGGAACTGCCGCCGAAATCGTTCTCCAGCTTTCGACGACTGCGACGCATCGACCTGTC 1311 

ii i nil i i inn i ii ii mi n nun 

Qy 948 caaagtcatccctcctggagctttctcaccatataaaaagcttagacgaattgacctgag 1007 
Db 1312 CAACAACAACATATCCCGGATTGCCCACGATGCACTAAGCGGCCTAAAGCAGTTAACCAC 1371 

in ii i ii n nn i inn i ii iii iii 

Qy 1008 caataatcagatctctgaacttgcaccagatgctttccaaggactacgctctctgaattc 1067 
Db 1372 TCTCGTGCTGTACGGCAATAAAATAAAGGATTTACCCTCGGGCGTGITCAAAGGACTCGG 1431 

ii ii ii ii ii mum i i! im ii ii mini 

Qy 1068 acttgtcctctatggaaataaaatcacagaactccccaaaagtttatttgaaggactgtt 1127 
Db 



t 



1432 CTCGCTCAGGCTGCTGCTGCTGAACGCCAACGAGATCTCGTGCATACGCAAGGATGCCTT 1491 

i! i iii ii i nn mm mi in i n inn n 

1128 ttccttacagctcctattattgaatgccaacaagataaactgccttcgggtagatgcttt 1187 



1492 TCGCGACCTGCACAGTTTGAGCCTGCTCTCCCTGTACGACAACAACATCCAGICGCTGGC 1551 

ii i! ii nn nn m mmii n mum in i n 

Qy 1188 tcaggatctccacaacttgaaccttctctccctatatgacaacaagcttcagaccatcgc 1247 
Db 1552 TAATGGCACATTCGACGCCATGAAGAGCATGAAAACGGTACATCTGGCCAAGAATCCTTT 1611 

ii ii n ii mi m nn i iii inn nn n n 

Qy 1248 caaggggaccttttcacctcttcgggccattcaaactatgcatttggcccagaacccctt 1307 
Db 1612 CATCTGCGACTGCAATCTGCGCTGGCTGGCCGACTATTTGCACAAAAATCCCATAGAGAC 1671 

ii ii mm nn nm n n in mm n n n inn 

Qy 1308 tatttgtgactgccatctcaagtggctagcggattatctccataccaacccgattgagac 1367 



Db 



1672 GAGTGGAGCCCGCTGC 1687 

inn inn in 

1368 cagtggtgcccgttgc 1383 



RESULT 12 

LOCUS 166494 7218 bp DNA 
DEFINITION Sequence 14 from' patent US 5670367. 



ACCESSION 
NID 

VERSION ■ 
KEYWORDS 
SOURCE 



166494 
g2724471 

166494.1 GI;2724471 



ORGANISM Unknown, 



Unclassified. 
1 (bases 1 to 7218) 
AUTHORS Dorner,F,, Scheif linger, F. and Falkner,F.Gunter. 
TITLE Recombinant fowlpox virus 
JOURNAL Patent: US 5670367-A 14 23-SEP-1997; 
FEATURES Location/Qualifiers 
source 1. .7218 

/organism-'unknown" 
BASE COUNT 1944 a 1491 c 1486 g 1929 t 368 others 
ORIGIN 

Query Match 1.7%; Score 80; DB 25; Length 7218; 

Best Local Similarity 2.4%; Pred, No, 1.95e-33; 

Matches 9; Conservative 221; Mismatches 150; Indels 0; Gaps 0; 

Db 1052 GAGGGAGCTTGCGATYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1111 

III | | II II : : :::: : : :: : :::: :: ::::::: :::: ::: 
Cp 4586 gagtgtgtttaggacacacacctcgtacagccgcacttcaccactttctcaacctcgtcc 4527 

Db 1112 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1171 
Cp 4526 acaaaggaggagccgtcagtgcattcgaaagagtatttccgccgcttgctcctcagcggt 4467 

Cp 4466 ccacagcactgccctcctgcacacccacctctgcactctaatcgggacaccttcttggtt 4407 

Db 1232 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1291 

Cp 4406 gtttggcaagcagcatagccctgctgcttttggtaataatctcttatcctttcccctcga 4347 

Db 1292 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1351 

Cp 4346 caagagatttctcgatcacagctgtcccccgtgtatccactgctgcattcacagtagggc 4287 

Cp 4286 tgccccagacctgaaagcctgcacttcccatgcttgcacttgatcgcctggcatgggtta 4227 
Db 1412 YYYYYYYYYYYYYYYYYYYY 1431 
Cp 4226 aacagatcctcctcttcatc 4207 



RESULT 13 

LOCUS G41330 297 bp DNA STS 26-AUG-1998 

DEFINITION Z1154 Zebrafish AB Danio rerio STS genomic, sequence tagged site. 
ACCESSION G41330 
NID g3461906 
VERSION G41330.1 GI: 3461906 
KEYWORDS STS. 
SOURCE zebrafish. 
ORGANISM Danio rerio 

Eukaryota; Metazoa; Chordata; Vertebrata; Actinopterygii; 
Neopterygii; Teleostei; Euteleostei; Ostariophysi; Cypriniformes; 
Cyprinoidea; Cyprinidae; Rasborinae; Danio, 
" 1 (bases 1 to 297) 

Shimoda,N. , Knapik,E,W., Ziniti, J. , Sim,C, Yamada,E,, Kaplan, S, 
and Fishman,M.C. 

A genetic linkage map of the zebrafish with 2000 microsatellite 
markers 

Unpublished (1998) 

Contact: Mark C. Fishman 
Cardiovascular Research Center 
Massachusetts General Hospital 

Mail code 1494100A, 149 13th Street, Charleston, MA 02129, USA 
Fax: 6177265806 

Email : fishman Omgh , cvrc . harvard , edu 
http : //zebrafish , mgh , harvard . edu 
Primer A: TCATGATTGTTTGGAATGTAATAGTG 
Primer B: TTGAGCGGTAGTCTTCTACGC 
STS size: 130 



AUTHORS 
TITLE 



JOURNAL 
COMMENT 
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PCR Profile: 

Presoak: 94 degrees C for 5,0 minutes 

Denaturation : 94 degrees C for 1.0 minute 

Annealing: 58 degrees C for 1.0 minute 

Polymerization: 72 degrees C for 1.5 minute 

PCR Cycles: 27 

Thermal Cycler: MJ Research PTC- 100 
Protocol ; 

Template: 10 ng 

Primer: each 375 nM 

dNTPs; each 200 uM 

Taq Polymerase; 0.034 units/ul 

Total Vol: 10 ul 



Buffer; 



MgCL2: 
KC1: 

Tris-HCl: 
P H: 



1.5 mM 
50 mM 
10 mM 
8.3 



^ Primers are available from Research Genetics Inc 
I (http://www.resgen.com phone: 800-533-4363). 
pRES Location/Qualifiers 
source 1. .297 

/organism= n Danio rerio" 

/strain- n AB" 

/note- "Vector: ml3MPl9 with added BstXI site; V-type: 
Phage; Genomic DNA from a single adult Zebrafish of AB 
strain was digested with Alul, Cac8I, Haelll, NlaVl, or 
Rsal, Fragments in the range of 250-500 bp were gel 
purified and a Bstxl linker was added. The fragments were 
cloned into a modified M13mpl9 vector and transformed 
into E. Coli DH5alpha, Microsatelllite sequences were 
screened with labeled d(CA)15 and d(GT)15 oligonucleotide 
probes , " 

/db_xref - " taxon : 7 9 5 5 " 
/clone_lib=" Zebrafish AB" 
/sex-"F" 

/dev_stage=" Adult" 
/lab_host="DH5alphaF'IQ" 
71. .200 
71. .96 

complement 180. .200) 
58 a 44 c 88 g 104 t 3 others 



STS 

primer_bind 
primer_bind 

BASE COUNT 

ORIGIN 



Query Match 1.4%; 
Best Local Similarity 79.8%; 
latches 91; Conservative 



Score 68; DB 34; Length 297; 
Pred. No. 5.24e-25; 
0; Mismatches 23; Indels ( 



«tc 
184 AGAAGACTACCGCTCAAAACTAGGTGGGGATTGCTTTGCTGATTTGGCCTGTCCAGAGAA 243 

Mill! II I! MIHI II INI II llllllll III llll II II II II 
Qy 1455 agaagattatcgatcaaaattaagtggagactgctttgcggatctggcttgccctgaaaa 1514 

Db 244 ATGTCGATGTAAAGGAACCACAGTGGACTGCTCTGGACAAAAGCTCACCAAGAT 297 

inn in imiiiimii i! nun iinimii in n 

Oy 1515 gtgtcgctgtgaaggaaccacagtagattgctctaatcaaaagctcaacaaaat 1568 



RESULT 14 

LOCUS AF088902 721 bp mRNA ROD 09-MAR-1999 

DEFINITION Mus musculus SLIT1 protein (Slitl) mRNA, partial cds. 
ACCESSION AF088902 
g4378026 

AF088902.1 GI: 4378026 



NID 

VERSION 
KEYWORDS 
SOURCE 



house mouse. 
ORGANISM MUS musculus 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutherla; 
Rodentia; Sciurognathi; Muridae; Murinae; Mus, 
1 (bases 1 to 721) 
AUTHORS Holmes, G.P,, Negus,K., Burridge,L., Raman,S., Algar,E., Yamada,T. 
and Little,M.H. 



Distinct but overlapping expression patterns of two vertebrate slit 
homologs implies functional roles in CNS development and 



AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



BASE COUNT 
ORIGIN 



Mech. Dev. 79 (1-2), 57-72 (1998) 
2 (bases 1 to 721) 
Negus, K. and Little, M.H. 
Direct Submission 

Submitted (31-AUG-1998) Centre for Molecular and Cellular Biology, 
University of Queensland, St, Lucia, Brisbane, Qld 4072, Australia 
Location/Qualifiers 
1. .721 
/organism- "Mus musculus" 
/db_xref»"taxon: 10090" 
/db_xref-"dbEST:AA423126" 
gene <1. .721 

/gene="Slitr 
miscjeature <1. .96 

/gene-"Slitl" 

/note-"Region: EGF-like repeat 8" 
CDS <1. .336 

/gene-'Slitr 

/note-"leucine-rich and EGF-like repeat containing 

protein; similar to Drosophila slit" 

/codon_start-l 

/product- "SLITl protein" 

/protein_id="AADl9349,l" 

/db.xref-"PID:g4378027" 

/db_xref-"GI; 4378027" 

/translation" "AAFKCHHGQCHISDRGEPYCLCQPGFSGHHCEQENPCMGEIVRE 
AIRRQKDYASCATASKVPIMECRGGCGSQCCQPIRSKRRKYVFQCTDGSSFVEEVERH 
LECGCRACS" 

miscjeature 196. .333 

/gene-"Slitr 

/note-"cysteine-rich knot; unclassified site" 
153 c 170 g 179 t 



219 a 



Query Match 1,4%; 
Best Local Similarity 63,0%; 
Matches 199; Conservative 



Score 67; DB 32; Length 721; 
Pred. No. 2,56e-24; 
0; Mismatches 114; Indels 3; 



Gaps 3; 

Db 8 TCAAGTGCCACCATGGGCAGTGTCACATATCAGATCGAGGGGAGCCCTATTGCCTATGCC 67 

llllllll I llllll llll I llll II III Mlllll || mi 
Qy , 4244 tcaagtgcaagcatgggaagtgcaggctttcaggtctggggcagccctactgtgaatgca 4303 

Db 68 AGCCTGGCTTCAGTGGCCATCACTGTGAGCAAGAGAATCCATGTATGGGGGAGATAGTCC 127 

Ml llllll 1 1 1 1 1 1 Ml! I I Ml Mill | | 
Qy 4304 gcagtggatacacgggggacagctgtgatcgagaaatctcttgtcgaggggaaaggataa 4363 

Db 128 GTGAAGCCATCCGCCGCCAGAAAGACTACGCCTCTTGTGCCACGGCGTCCAAGGTGCCCA 187 

I II I! Ill I I III II III! II I MUM II 
Qy 4364 gagattattaccaaaagcagcagggctatgctgcttgccaaacaaccaagaaggtgtccc 4423 

Db 188 TCATGGAATGCCGCGG-GGGC-TGCGGGAGC-CAGTGCTGCCAGCCGATTCGAAGCAAGC 244 

I I! Ill I II III Ml MM llllllll III I I Mil 

Qy 4424 gattagagtgcagaggtgggtgtgcaggagggcagtgctgtggaccgctgaggagcaagc 4483 

Db 245 GGCGGAAATATGTCTTCCAGTGCACGGACGGCTCCTCATTCGTGGAAGAGGTGGAGAGAC 304 

1 1 1 1 1 1 1 1 1 1 III I Mill IMIIIIMII M llll! Mill llll I 
Qy 4484 ggcggaaatactctttcgaatgcactgacggctcctcctttgtggacgaggttgagaaag 4543 

Db 305 ACTTGGAATGTGGCTG 320 

II I II Mill 

Qy 4544 tggtgaagtgcggctg 4559 

RESULT 15 

LOCUS 166494 7218 bp DNA 

DEFINITION Sequence 14 from patent US 5670367, 

ACCESSION 166494 

NID g2724471 

VERSION 166494.1 GI: 2724471 
KEYWORDS 



23-DEC-1997 



Tue Jun 1 10:15:50 1999 



US-09-191-647-l.rge 



Page 27 



SOURCE Unknown . 
ORGANISM Unknown. 

Unclassified. 
REFERENCE 1 (bases 1 to 7218) 
AUTHORS Dorner,F., Scheiflinger,F. and Falkner,F.Gunter, 
TITLE Recombinant fowlpox virus 
JOURNAL Patent: US 5670367-A 14 23-SEP-1997; 
FEATURES Location/Qualifiers 
source 1. .7218 

/organism^unknown" 
BASE COUNT 1944 a 1491 c 1486 g 1929 t 368 others 
ORIGIN 

Query Match 1.2%; Score 57; DB 25; Length 7218; 

Best Local Similarity 0.8%; Pred. No, 1.49e-17; 

Matches 3; Conservative 208; Mismatches 154; Indels C 



Gaps 0; 



i 1123 



III 



Qy 1036 gatgctttccaaggactacgctctctgaattcacttgtcctctatggaaataaaatcaca 1095 

Oy 1096 gaactccccaaaagtttatttgaaggactgttttccttacagctcctattattgaatgcc 1155 

Db 1184 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1243 

Oy 1156 aacaagataaactgccttcgggtagatgcttttcaggatctccacaacttgaaccttctc 1215 

Db 1244 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY 1303 

Oy 1216 tccctatatgacaacaagcttcagaccatcgccaaggggaccttttcacctcttcgggcc 1275 

Qy 1276 attcaaactatgcatttggcccagaacccctttatttgtgactgccatctcaagtggcta 1335 

Qy 1336 gcggattatctccataccaacccgattgagaccagtggtgcccgttgcaccagcccccgc 1395 
Db 1424 YYYYY 1428 
Qy 1396 cgcct 1400 

^•ch completed: Sat May 29 20:36:33 1999 
Job time : 8593 sees. 
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Release 3.1A John F. Collins, Biocoraputing Research Unit. 
Copyright (c) 1993-1998 University of Edinburgh, U.K, 
Distribution rights by Oxford Molecular Ltd 

H|;rch_nn n,a, * n.a. database search, using Smith-Waterman algorithm 

sun on; • Sat May 29 22:50:43 1999; MasPar time 1032.12 Seconds 

987,779 Million cell updates/sec 

Tabular output not generated. 



Title: 

Description: 



Perfect Score: 4758 
N,A. Sequence: 
Comp: 



MJS-09-191-647-1 
(1-4758) from IJS09191647 . seq 



1 atgcgcggcgttggctggca aaaaaaaaaaaaaaactcga 4758 

tacgcgccgcaaccgaccgt tttttttttttttttgagct 



Scoring table: TABLE default 
Gap 6 

Hatch STD : Dbase 0; Query 0 

Searched: 271905 seqs, 107135622 bases x 2 

Post-processing; Minimum Match 0* 

Listing first 45 summaries 



n-geneseq35 
1: parti 2: 
8:part8 
:partl4 
partl9 
part24 
part29 
part34 
part39 
part44 
part49 
part54 
part59 



::part2 3:part3 4:part4 5 :part5 6:part6 7:part7 
:part9 10;partl0 lhpartll 12:partl2 13;partl3 



15: partlS 16 

20:part20 21 

25:part25 26 

30:part30 31 

35:part35 36 

40:part40 41 

45:part45 46: 

50:part50 51 

55:part55 56 
60:part60 



:partl6 17:partl7 18:partl8 
:part21 22:part22 23:part23 
:part26 27:part27 28;part28 
:part31 32:part32 33:part33 
:part36 37:part37 38:part38 
:part41 42:part42 43:part43 
:part46 47:part47 48:part48 
:part51 52:part52 53:part53 
:part56 57:part57 58;part58 



Lstics: Mean 10.449; Variance 6.610; scale 1.581 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution, 



% 

Query 



NO, 


Score 


Match Length 


DB 


ID 


Description 


Pred. No. 


1 


1231 


25,9 5094 


41 


V16978 


Nucleic acid encoding 


0.00e+00 


2 


200 


4,2 684 


41 


V16966 


Nucleic acid sequence 


3,22e-106 


3 


171 


3,6 551 


41 


V16967 


Nucleic acid sequence 


2.32e-87 


4 


104 


2.2 8378 


4 


Q25811 


Drosophila SLIT prote 


1.07e-44 


5 


54 


1,1 5617 


51 


V57163 


Partial human Notch-3 


9.57e-15 


6 


54 


1.1 8091 


50 


V57001 


Human Notch3 cDNA. 


9.57e-15 


7 


41 


0.9 91 


9 


Q51746 


Oligonucleotide probe 


1.20e-07 


8 


43 


0,9 91 


9 


Q51746 


Oligonucleotide probe 


l.Q5e-08 


9 


43 


0.9 204 


1 


N81164 


Base substituted E.co 


1.05e-08 



c 10 44 


0.9 204 1 


N81164 


Base substituted E.co 


3.08e 


09 


11 45 


0.9 341 21 


T24530 


Human gene signature 


8.92e 


10 


c 12 37 


0.8 114 12 


Q70468 


Generic DNA sequence 


1.41e 


05 


c 13 36 


0,8 114 12 


Q70469 


Generic DNA sequence 


4.52e 


05 


14 32 


0,7 9146 


V44650 


Mammalian DNA replica 


4.23e 


03 


c 15 31 


0,7 9146 


V44650 


Mammalian DNA replica 


1.27e 


02 


16 34 


0,7 114 12 


Q70467 


Generic DNA sequence 


4.49e 


04 


17 34 


0.7 114 12 


Q70468 


Generic DNA sequence 


4.49e 


04 


18 34 


0,7 114 12 


Q70465 


Generic DNA sequence 


4.49e 


04 


19 34 


0,7 114 12 


Q70469 


Generic DNA sequence 


4.49e 


04 


20 33 


0,7 114 12 


Q70466 


Generic DNA sequence 


1.39e 


03 


21 32 


0,7 114 12 


Q70470 


Generic DNA sequence 


4.23e 


03 


c 22 34 


0,7 114 12 


Q70467 


Generic DNA sequence 


4.49e 


04 


C 23 34 


0.7 114 12 


Q70465 


Generic DNA sequence 


4.49e 


04 


c 24 33 


0.7 114 12 


Q70470 


Generic DNA sequence 


1.39e 


03 


c 25 33 


0.7 114 12 


Q70472 


Generic DNA sequence 


1.39e 


03 


c 26 32 


0.7 114 12 


Q70466 


Generic DNA sequence 


4.23e 


03 


c 27 32 


0.7 168 32 


T76270 


Human MDNCF antisense 


4.23e 


03 


c 28 32 


0.7 172 32 


T76363 


Human interleukin 8 a 


4.23e 


03 


29 31 


0.7 178 32 


T76405 


Human endothelin-1 an 


1.27e 


02 


30 31 


0.7 2088 28 


T58897 


C - Delta -1 gene. 


1.27e 


02 


31 33 


0.7 2446 17 


T08768 


Rat biglycan cDNA, 


1.39e 


03 


32 35 


0.7 2663 35 


T70174 


Proliferation and dif 


1.43e 


04 


33 33 


0.7 2692 28 


T58899 


M-Delta-1 gene. 


1.39e 


03 


34 35 


0,7 2883 28 


T58898 


C-Delta-1 gene (alter 


1.43e 


04 


35 31 


0.7 4208 35 


T70175 


Proliferation and dif 


1.27e 


02 


36 31 


0,7 4208 40 


V15201 


Human serrate 1 encod 


1.27e 


02 


37 31 


0.7 5458 39 


V03674 


Human Jagged gene. 


1.27e 


02 


38 31 


0.7 6464 24 


T40090 


Human Serrate- 1 (HJ1) 


1.27e 


02 


39 35 


0.7 8224 2 


Q12261 


Versican gene. 


1.43e 


04 


40 30 


0.6 114 12 


Q70472 


Generic DNA sequence 


3.77e 


02 


41 29 


0.6 114 12 


Q70471 


Generic DNA sequence 


l.lOe 


01 


42 30 


0.6 1506 16 


Q95302 


Murine Fas antigen ex 


3.77e 


02 


43 30 


0.6 1506 53 


V71961 


Fas ligand (FasL) pro 


3.77e 


02 


44 30 


0,6 1506 21 


T16305 


Coding sequence for m 


3.77e 


02 


45 30 


0,6 1515 19 


T07072 


Adhesive protein gene 


3.77e 


02 



ALIGNMENTS 

RESULT 1 

ID V16978 standard; CDNA to mRNA; 5094 BP, 

AC V16978; 

DT 06-JUL-1998 (first entry) 

DE Nucleic acid encoding a human slit-like polypeptide. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody; ds, 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT CDS 233., 4837 

FT /*tag- a 

FT sig.peptide 233., 310 

FT /*tag- b 

FT matjieptide 311.. 4834 

FT /*tag- c 

PN J10087699-A, 

PD 07-APR-1998. 

PF 15-JUL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (AS AH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-267127/24, 

DR P-PSDB; W46967. 

PT Human Slit-like protein • useful for diagnosis and treatment of 

PT brain-specific diseases and cancers 

PS Disclosure; Pages 24-30; 45pp; Japanese. 

CC The present sequence encodes a novel human slit-like protein (the 

CC mature protein is claimed in Claim 1), The slit-like polypeptide is 

CC useful for diagnosis and treatment of brain-specific diseases and 

CC. cancers. Antibodies directed against the protein, or its fragments 

CC can also be used for diagnosing cancer, 

SQ Sequence 5094 BP; 1037 A; 1616 C; 1508 G; 933 T; 



Query Match 



25.9%; Score 1231; DB 41; Length 5094; 



Tue Jun 1 10:15:51 1999 



US-09-191 



647-1. rng 



Page 2 



Best Local Similarity 65.0%; Pred. Ho. 0.00e+QQ; 



Matches 


2926; Conservative 0; Mismatches 1551; Indels 24; Gc 


ps 17 


Db 


328 


ggcgtgccccgccctctgcacctgcaccggaaccacggtggactgccacggcacggggct 

1 1 1 1 1 t 1 1 1 II 1 III 1 III 1 II 1 III 1 1 1 1 1 1 1 1 Mill II III 

MINIMI II I III | Ml | || | HI 1 1 1 1 1 1 1 1 1 1 1 1 1 || HI 

ggcgtgcccggcgcagtgctcttgctcgggcagcacagtggactgtcacgggctggcgct 


387 


Qy 


78 


137 


Db 


388 


gcaggccattcccaagaatatacctcggaacaccgagcgcctggaactcaatggcaacaa 

ii i i mi linn ii ii muni i inn i inn i n 

gcgcagcgtgcccaggaatatcccccgcaacaccgagagactggatttaaatggaaataa 


447 


Qy 


138 


197 


Db 


448 


catcactcggatccataagaatgactttgcggggctcaagcagctgcgggtgctgcagct 

mm i ii mi n mil ii n i n n i n n inn 


507 


Qy 


198 


catcacaagaattacgaagacagattttgctggtcttagacatctaagagttcttcagct 


257 


Db 


508 


gatggagaaccagattggagcagtggaacgtggtgcttttgatgacatgaaggagctgga 
mmii HUM 1 1 III 1 II II II III 1 II. II II II 
tatggagaataagattagcaccattgaaagaggagcattccaggatcttaaagaactaga 


567 


Qy 


258 


317 


Db 

| 


568 
318 


gcggctgcgactgaaccgaaaccagctgcacatgttaccggaactgctgttccagaacaa 

i i urn i m mi ii ii ii mi n n nun i i 

gagactgcgtttaaacagaaatcaccttcagctgtttcctgagttgctgtttcttgggac 


627 
377 


Db 


628 


ccaggctttgtcaagactggacttgagtgagaacgccatccaggccatccccaggaaagc 

i i i imi ii i inn iii ii inn urn mmii 


687 


Qy 


378 


tgcgaagctatacaggcttgatctcagtgaaaaccaaattcaggcaatcccaaggaaagc 


437 


Db 


688 


ttttcggggagctacggaccttaaaaatttacggctggacaagaaccagatcagctgcat 
in II II II III 1 MINN I mil | IINIIIINI III II 
tttccgtggggcagttgacataaaaaatttgcaactggattacaaccagatcagctgtat 


747 


Qy 


438 


497 


Db 


748 


tgaggaaggggccttccgtgctctgcgggggctggaggtgctgaccctgaacaacaacaa 

in ii inn iii i inn mi inn inn n n urn inn 


807 


Qy 


498 


tgaagatggggcattcagggctctccgggacctggaagtgctcactctcaacaataacaa 


557 


Db 


808 


tatcaccaccatccccgtgtccagcttcaaccatatgcccaagctacggaccttccgcct 

ii ii i ii iii i ii iiiiiiimim ii ii mi ii ii n 


867 


Qy 


558 


cattactagactttctgtggcaagtttcaaccatatgcctaaacttaggacttttcgact 


617 


Db 


868 


gcactccaaccacctgttttgcgactgccacctggcctggctctcgcagtggctgaggca 


927 


Qy 


618 


gcattcaaacaacctgtattgtgactgccacctggcctggctctccgactggcttcgcaa 


677 


Db 


928 


gcggccaaccatcgggctcttcacccagtgctcgggcccagccagcctgcgtggcctcaa 

mi i ii ii i in inn linn ii mi i mi n 

aaggcctcgggttggtctgtacactcagtgtatgggcccctcccacctgagaggccataa 


987 


Qy 


678 


737 


Db 


988 


tgtggcagaggtccagaagagtgagttcagctgctcaggccagggagaagcggggc-g-c 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 i iii 

in ii inn ii n i n ii mi i in inn i i i 

tgtagccgaggttcaaaaacgagaatttgtctgcagtgatgaggaagaaggtcaccagtc 


1045 


1 


738 


797 


w 


1046 


gtgcccacctgcaccctgtcctccggctc-ctgcccggccatgtgcacctgcagcaatgg 

i ii i i iii i i mm in ii inn nm 


1104 


Qy 


798 


atttatggctccttcttgtagtgttttgcactgccctgccgcctgtacctgtagcaacaa 


857 


Db 


1105 


catcgtggactgtcgtggaaaaggcctcactgccatcccggccaacctgcccgagaccat 


1164 


Qy 


858 


nm imiiiiiii nm mini inn i n n n iniiin 

tatcgtagactgtcgtgggaaaggtctcactgagatccccacaaatcttccagagaccat 


917 


Db 


1165 


gacggagatacgcctggagctgaacggcatcaagtccatccctcctggagccttctcacc 


1224 


Qy 


918 


cacagaaatacgtttggaacagaacacaatcaaagtcatccctcctggagctttctcacc 


977 


Db 


1225 


ctacagaaagctacggaggatagacctgagcaacaatcagatcgctgagattgcacccga 


1284 


Qy 


978 


atataaaaagcttagacgaattgacctgagcaataatcagatctctgaacttgcaccaga 


1037 


Db 


1285 


cgccttccagggcctccgctccctgaactcgctggtcctctatggaaacaagatcacaga 
tgctttccaaggactacgctctctgaattcacttgtcctctatggaaataaaatcacaga 


1344 


Qy 


1038 


1097 


Db 


1345 


cctcccccgtggtgtgtttggaggcctatacaccctacagctcctgctcctgaatgccaa 


1404 



Qy 1098 actccccaaaagtttatttgaaggactgttttccttacagctcctattattgaatgccaa 1157 

Db 1405 caagatcaactgcatccggcccgatgccttccaggacctgcagaacctctcactgctctc 1464 

Qy 1158 caagataaactgccttcgggtagatgcttttcaggatctccacaacttgaaccttctctc 1217 

Db 1465 cctgtatgacaacaagatccagagcctcgccaagggcactttcacctccctgcgggccat 1524 

Qy 1218 cctatatgacaacaagcttcagaccatcgccaaggggaccttttcacctcttcgggccat 1277 

Db 1525 ccagactctgcacctggcccagaaccctttcatttgcgactgtaacctcaagtggctggc 1584 

Qy 1278 tcaaactatgcatttggcccagaacccctttatttgtgactgccatctcaagtggctagc 1337 

Db 1585 agacttcctgcgcaccaatcccatcgagacgagtggtgcccgctgtgccagtccccggcg 1644 

ii i ii i nm ii ii inn minimi n mi inn n 

Qy 1338 ggattatctccataccaacccgattgagaccagtggtgcccgttgcaccagcccccgccg 1397 

Db 1645 cctcgccaacaagcgcatcgggcagatcaagagcaagaagttccggtgctcagccaaaga 1704 

Qy 1398 cctggcaaacaaaagaattggacagatcaaaagcaagaaattccgttgttcaggtacaga 1457 

Db 1705 gcagtacttcattccaggcacggaggattaccagctgaacagcgagtgcaacagcgacgt 1764 

II I II II I I II I llll I II I II III I II I 
Qy 1458 ••ag-a-tt-at-cgat-caa--a--attaagtggaga-ctgctt-tgc----g-gatct 1499 

Db 1765 ggtctgtccccacaagtgccgctgtgaggccaacgtggtggagtgctccagcctgaagct 1824 

ii ii ii i nm mini i 1 1 ii ii nm i i mil 

Qy 1500 ggcttgccctgaaaagtgtcgctgtgaaggaaccacagtagattgctctaatcaaaagct 1559 

Db 1825 caccaagatccctgagcgcatcccccagtccacggcagaactgcgattgaataacaatga 1884 

Qy 1560 caacaaaatcccggagcacattccccagtacactgcagagttgcgtctcaataataatga 1619 

Db 1885 gatttccatcctggaggccactgggatgtttaaaaaacttacacatctgaagaaaatcaa 1944 

ii ii i mi mil ii ii nm linn i ii i inn 

Qy 1620 atttaccgtgttggaagccacaggaatctttaagaaacttcctcaattacgtaaaataaa 1679 

Db 1945 tctgagcaacaacaaggtgtcagaaattgaagatggggccttcgagggcgcagcctctgt 2004 

I MMII III I llll Mill II II II II II II III I III 
Qy 1680 ctttagcaacaataagatcacagatattgaggagggagcatttgaaggagcatctggtgt 1739 

Db 2005 gagcgagctgcacctaactgccaaccagctggagtccatccggagcggcatgttccgggg 2064 

I II I I IMI III MM I I I mm III 

Qy 1740 aaatgaaatacttcttacgagtaatcgtttggaaaatgtgcagcataagatgttcaaggg 1799 

Db 2065 tctggatggcttgaggaccctaatgctgcggaacaaccgcatcagctgcatccacaacga 2124 

llll II I I II I III II I I III II II I Ml I II II 
Qy 1800 attggaaagcctcaaaactttgatgttgagaagcaatcgaataacctgtgtggggaatga 1859 

Db 2125 cagcttcacgggcctgcgcaacgtccggctcctctcgctctacgacaaccagatcaccac 2184 

III Mil II II I II II I II II I M II II II II II II 
Qy 1860 cagtttcataggactcagttctgtgcgtttgctttctttgtatgataatcaaattactac 1919 

Db 2185 cgtatccccaggagccttcgacaccctccagtccctctccacactgaatctcctggccaa 2244 

II I IIIM II II II II Mill II I II II II II III 1 1 1 1 1 1 1 

Qy 1920 agttgcaccaggggcattt'gatactctccattctttatctactctaaacctcttggccaa 1979 

Db 2245 ccctttcaactgcaactgccagctggcctggctaggaggctggctacggaagcgcaagat 2304 

mil mil linn i inn in i in nm i in i n 

Qy 1980 tccttttaactgtaactgctacctggcttggttgggagagtggctgagaaagaagagaat 2039 

Db 2305 cgtgacggggaacccgcgatgccagaaccctgactttttgcggcagattcccctgcagga 2364 

II Mill II II llll II II II llll II I II III I Mill 
Qy 2040 tgtcacgggaaatcctagatgtcaaaaaccatacttcctgaaagaaatacccatccagga 2099 

Db 2365 cgtggccttccctgacttcaggtgtgaggaaggccaggaggaggggggctgcctgccccg 2424 

nun I I mini in ii ii i ii m i mi m i 

Qy 2100 tgtggccattcaggacttcacttgtgatgacggaaatgatgacaatagttgctccccact 2159 

Db 2425 cccacagtgcccacaggagtgcgcctgcctggacaccgtggtccgatgcagcaacaagca 2484 

II II II II II I III llll II II IIIIIIM MMIIMI 
Qy 2160 ttctcgctgtcctactgaatgtacttgcttggatacagtcgtccgatgtagcaacaaggg 2219 
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Db 2485 cctgcgggccctgcccaagggcattcccaagaatgtcacagaactctatttggacgggaa 2544 

II II I llll II II Mill I Illlllllll I III Mil II II 
Qy 2220 tttgaaggtcttgccgaaaggtattccaagagatgtcacagagttgtatctggatggaaa 2279 

Db 2545 ccagttcacgctggttccgggacagctgtctaccttcaagtacctgcagctcgtggacct 2604 

III II II IIIIMII I II II I II III II II I III I 
Qy 2280 ccaatttacactggttcccaaggaactctccaactacaaacatttaacacttatagactt 2339 

Db 2605 gagcaacaacaagatcagttccttaagcaattcctccttcaccaacatgagccagctgac 2664 

II IIMII! II II I I III 1 1 1 1 1 llllllll llllll 
Qy 2340 aagtaacaacagaataagcacgctttctaatcagagcttcagcaacatgacccagctcct 2399 

Db 2665 cactctgatcctcagctacaatgccctgcagtgcatcccgcctttggccttccagggact 2724 

III I II II II lllll III II II II III llll I III I 
Qy 2400 caccttaattcttagttacaaccgtctgagatgtattcctcctcgcacctttgatggatt 2459 




2725 ccgctccctgcgcctgctgtctctccacggcaatgacatctccaccctccaagagggcat 2784 



Qy 2460 aaagtctcttcgattactttctctacatggaaatgacatttctgttgtgcctgaaggtgc 2519 

Db 2785 ctttgcagacgtgacctccctgtctcacctggccattggtgccaaccccctatactgtga 2844 

II II II I I II II II II Mill millll II llllllll 

Qy 2520 tttcaatgatctttctgcattatcacatctagcaattggagccaaccctctttactgtga 2579 

Db 2845 ctgccacctccgctggctgtccagctgggtgaagactggctacaaggaaccgggcattgc 2904 

ii ii i i m i iii minim i i ii inn n n iiiii 

Qy 2580 ttgtaacatgcagtggttatccgactgggtgaagtcggaatataaggagcctggaattgc 2639 

Db 2905 tcgttgtgctgggccccaggacatggagggcaagctgctcctcaccacgcctgccaagaa 2964 

imiiinm n n mi i n n i inn n n mi n 

Qy 2640 tcgttgtgctggtcctggagaaatggcagataaacttttactcacaactccctccaaaaa 2699 

Db 2965 gtttgaatgccaaggtcctccaacgctggctgtccaggccaagtgtgatctctgcttgtc 3024 

in ii 1 1 1 1 1 1 1 1 1 i i ii ii mm 1 1 mi i n 

Qy 2700 atttacctgtcaaggtcctgtggatgtcaatattctagctaagtgtaacccctgcctatc 2759 

Db 3025 cagtccgtgccagaaccagggcacctgccacaacgacccccttgaggtgtacaggtgcgc 3084 

I MIMI I II I lllll II I I II II llll I III I III I 

Qy 2760 aaatccgtgtaaaaatgatggcacatgtaatagtgatccagttgacttttaccgatgcac 2819 

Db 3085 ctgccccagcggctataagggtcgagactgtgaggtgtccctgaacagctgttccagtgg 3144 

III II III lllll I llllllll lllll III llll 

Qy 2820 ctgtccatatggtttcaaggggcaggactgtgatgtcccaattcatgcctgcatcagtaa 2879 

f3145 cccctgtgaaaatgggggcacctgccatgcacaggagggcgaggatgccccgttcacgtg 3204 
Mi III II MM II II lllll I llll II II II I III III 

2880 cccatgtaaacatggaggaacttgccacttaaaggaaggagaagaagatggattctggtg 2939 

Db 3205 ctcctgtcccaccggctttgaaggaccaacctgtggggtgaacacagatgactgtgtgga 3264 

III I II lllllllll II llll II III lllll llll II 

Qy 2940 tatttgtgctgatggatttgaaggagaaaattgtgaagtcaacgttgatgattgtgaaga 2999 

Db 3265 tcatgcctgtgccaatgggggcgtctgtgtggatggtgtgggcaactacacctgccagtg 3324 

I III lllll III lllll lllll I llllllll Nil II 

Qy 3000 taatgactgtgaaaataattctacatgtgtcgatggcattaataactacacatgcctttg 3059 

Db 3325 ccccctgcagtatgagggaaaggcctgtgagcagctggtggacttgtgctctccggatct 3384 

iii i inn ii ii iiiiii ii i mini n i i m n 

Qy 3060 cccacctgagtatacaggtgagttgtgtgaggagaagctggacttctgtgcccaggacct 3119 

Db 3385 gaacccatgtcaacacgaggcccagtgtgtgggcaccccggatgggcccaggtgtgagtg 3444 

llllll II II lllll I llll I II II I II II lllll II 

Qy 3120 gaacccctgccagcacgattcaaagtgcatcctaactccaaagggattcaaatgtgactg 3179 

Db 3445 catgccaggttatgcaggtgacaactgcagtgagaaccaggatgactgcagggaccaccg 3504 

ii iiiii n i mm inn i n mm in n 

Qy 3180 cacaccagggtacgtaggtgaacactgcgacatcgattttgacgactgccaagacaacaa 3239 

Db 3505 ctgccagaatggggcccagtgtatggatgaagtcaacagctactcctgcctctgtgctga 3564 

n i ii n mn n i nn in m mi i in in in 

Qy 3240 gtgtaaaaacggagcccactgcacagatgcagtgaacggctatacgtgcatatgccccga 3299 



Db 3565 gggctacagtggacagctctgtgagatccctcc--ccatctgcctgcccccaag-agccc 3621 

II llllllll I llllllll I llll III! Ill II I I lllll 
Qy 3300 aggttacagtggcttgttctgtgagttttctccacccatggtcctccctcgtaccagccc 3359 

Db 3622 ctgtgaggggactgagtgccagaatggggccaactgtgtggaccagggcaacaggcctgt 3681 

llllll III II llllllll II I III I I I II III I 
Qy 3360 ctgtgataattttgattgtcagaatggagctcagtgtatcgtcagaataaatgagccaat 3419 

Db 3682 gtgccagtgcctcccaggcttcggtggccctgagtgtgagaagttgctcagtgtcaactt 3741 

II lllll I II MM II IIIIMI II Ml I lllll II II 
Qy 3420 atgtcagtgtttgcctggctatcagggagaaaagtgtgaaaaattggttagtgtgaattt 3479 

Db 3742 tgtggatcgggacacttacctgcagttcactgacctgcaaaactggccacgggccaacat 3801 

III II llll II III I II I MM I I I lllll 
Qy 3480 tataaacaaagagtcttatcttcagattccttcagccaaggttcggcctcagacgaacat 3539 

Db 3802 cacgttgcaggtctccacggcagaggacaatgggatccttctgtacaacggggacaacga 3861 

M I III I llll I II llll II Mill lllll II II lllll II 
Qy 3540 aacacttcagattgccacagatgaagacagcggaatcctcctgtataagggtgacaaaga 3599 

Db 3862 ccacattgcagttgagctgtaccagggccatgtgcgtgtcagctacgacccaggcagcta 3921 

III II II II M II II I III I III llll llllll III I III I 
Qy 3600 ccatatcgcggtagaactctatcgggggcgtgttcgtgccagctatgacaccggctctca 3659 

Db 3922 ccccagctctgccatctacagtgctgagacgatcaacgatgggcaattccacaccgttga 3981 

ii mniM inn mn mn iiiii i mini n n 

Qy 3660 tccagcttctgccatttacagtgtggagacaatcaatgatggaaacttccacattgtgga 3719 

Db 3982 gctggttgcctttgaccagatggtgaatctctccattgatggcgggagccccatgaccat 4041 

II Hilll II MM I I! Ill I lllll llll lllll I III 

Qy 3720 actacttgccttggatcagagtctctctttgtccgtggatggtgggaaccccaaaatcat 3779 

Db 4042 ggacaactttggcaaacattacacgctcaacagcgaggcgccactctatgtgggagggat 4101 

Mill II II I III II II II I MIMIMMI lllll II 
Qy 3780 cactaacttgtcaaagcagtccactctgaattttgactctccactctatgtaggaggcat 3839 

Db 4102 gcccgtggatgtcaactcagctgccttccgcctgtggcagatcctcaacggcaccggctt 4161 

III I I I III II I I llll I I I lllll III Mil 
Qy 3840 gccagggaagagtaacgtggcatctctgcgccaggcccctgggcagaacggaaccagctt 3899 

Db 4162 ccacggttgcatccgaaacctgtacatcaacaacgagctgcaggacttcaccaagacgca 4221 

llllll llll II II lllll Illlllllll lllimilllllll Ml II 
Qy 3900 ccacggctgcatccggaacctttacatcaacagtgagctgcaggacttccagaaggtgcc 3959 

Db 4222 gatgaagccaggcgtggtgccaggctgcgaaccctgccgcaagctctactgcctgcatgg 4281 

llll I lllll I llll lllll II II llll llll II lllll 
Qy 3960 gatgcaaacaggcattttgcctggctgtgagccatgccacaagaaggtgtgtgcccatgg 4019 

Db 4282 catctgccagcccaatgccaccccagggcccatgtgccactgcgaggctggctgggtggg 4341 

II Illlllllll I llll II III I III III II III llll 
Qy 4020 cacatgccagcccagcagccaggcaggcttcacctgcgagtgccaggaaggatggatggg 4079 

Db 4342 cctgcactgtgaccagcccgctgacggcccctgccatggccacaagtgtgtccatgggca 4401 

i i iiiiiiiii i i i i in mi in i ii ii ii mn 

Qy 4080 gcccctctgtgaccaacggaccaatgacccttgccttggaaataaatgcgtacatggcac 4139 

Db 4402 atgcgtgcccctcgacgctctttcctacagctgccagtgccaggatgggtactcgggggc 4461 

III lllll II I II I MIMimil lllll III II I III 

Qy 4140 ctgcttgcccatcaatgcgttctcctacagctgtaagtgcttggagggccatggaggtgt 4199 

Db 4462 actgtgcaaccaggccggggccctggcagagccctgcagaggcctgcagtgcctgcatgg 4521 

II II I II I II III I II III I I lllll mm 
Qy 4200 cctctgtgatgaagaggaggatctgtttaacccatgccaggcgatcaagtgcaagcatgg 4259 

Db 4522 ccactgccaggcctcaggcaccaagggggcacactgtgtgtgtgaccccggcttttcggg 4581 

i iii i inn i 1 1 mm n i n i mi 

Qy 4260 gaagtgcaggctttcaggtctggggcagccctactgtgaatgcagcagtggatacacggg 4319 

Db 4582 cgagctgtgtgagcaagagtccgagtgccggggggaccctgtccgggactttcaccaggt 4641 

II Mill I III I II II Mill I lllll llll 
Qy 4320 ggacagctgtgatcgagaaatctcttgtcgaggggaaaggataagagattattaccaaaa 4379 

Db 4642 ccagaggggctatgccatctgccagaccacgcgccccctgtcatgggtggagtgccgggg 4701 
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III IIMIIMI lllll II II Mil I I Mill I II 
Qy 4380 gcagcagggctatgctgcttgccaaacaaccaagaaggtgtcccgattagagtgcagagg 4439 

Db 4702 ctcgtgcccaggccagggctgctgccagggccttcggctgaagcggaggaagttcacctt 4761 

III MM I Mill II II INI MM I | | || 
Qy 4440 tgggtgtgcaggagggcagtgctgtggaccgctgaggagcaagcggcggaaatactcttt 4499 

Db 4762 tgagtgcagcgatgggacctcttttgccgaggaggtggaaaagcccaccaagtgtggctg 4821 

II I M I II II Mil MM II Mill II II lllll lllll 
Qy 4500 cgaatgcactgacggctcctcctttgtggacgaggttgagaaagtggtgaagtgcggctg 4559 

Db 4822 t 4822 

I 

Qy 4560 t 4560 



RESULT 2 

ID V16966 standard; cDNA to mRNA; 684 BP. ' 

AC V16966; 

DT 06-JOI-1998 (first entry) 

• Nucleic acid sequence of the specification. 
Slit-like protein; human; diagnosis; treatment; brain-specific disease; 
cancer; antibody; ds. 

OS Mus sp. 

PN J10087699-A. 

PD 07-APR-1998. 

PF 15-JUL-1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (AS AH ) AS AH I KASEI KOGYO KK. 

DR WPI; 98-267127/24. 

DR P-PSDB; W46967, 

PT Human Slit-like protein • useful for diagnosis and treatment of 

PT brain -specific diseases and cancers 

PS Disclosure; Pages 35-36; 45pp; Japanese. 

CC The present sequence appears in the specification. The specification 

CC describes a novel human slit-like protein (the mature protein is claimed 

CC in Claim 1). The slit-like polypeptide is useful for diagnosis and 

CC treatment of brain -specific diseases and cancers. Antibodies directed 

CC against the protein, or its fragments can also be used for diagnosing 

CC cancer. 

SQ Sequence 684 BP; 143 A; 206 C; 199 G; 136 T; 



Query Match 4.2%; 
Best Local Similarity 66.0%; 
Matches 413; Conservative 



Score 200; DB 41; Length 684; 
Pred. No. 3.22e-106; 
0; Mismatches 213; Indels 0; 



Gaps 



Db 



4 gaggacaatgggattctcctatataatggggacaatgaccacattgcagttgagctgtac 63 
M III I M II IMM lllll II Mill lllll II II II II II II 
^ 3562 gaagacagcggaatcctcctgtataagggtgacaaagaccatatcgcggtagaactctat 3621 

64 cagggccatgtccgtgttagttatgacccaggcagctaccccagctctgctatctacagc 123 
I Ml I III MM II IIM I Ml I || IMM || IMM 
Qy 3622 cgggggcgtgttcgtgccagctatgacaccggctctcatccagcttctgccatttacagt 3681 

Db 124 gctgagacgatcaacgatgggcagttccacaccgttgagctggtcaccttcgaccagatg 183 

I Mill lllll lllll I MIIMI II II II I IIM II MM 

Qy 3682 gtggagacaatcaatgatggaaacttccacattgtggaactacttgccttggatcagagt 3741 

Db 184 gtgaacctctccatcgatggcggcagccccatgaccatggacaactttggcaagcactac 243 

I I IIM lllll II I IMM I Ml Mill lllll I I 
Qy 3742 ctctctttgtccgtggatggtgggaaccccaaaatcatcactaacttgtcaaagcagtcc 3801 

Db 244 accctcaacagcgaggcccccctctatgtgggagggatgcctgtggacgtgaactcagct 303 

II II M 1 1 I M 1 1 1 1 1 1 ! I lllll lllll IN Ml II 

Qy 3802 actctgaattttgactctccactctatgtaggaggcatgccagggaagagtaacgtggca 3861 

Db 



3 04 gccttccgcctgtggcagatcctcaatggtaccagcttccacggttgcatccggaacctg 363 

I I MM II I II II MIMIIMIMM IMMMMIIIII 
3862 tctctgcgccaggcccctgggcagaacggaaccagcttccacggctgcatccggaacctt 3921 



Db 364 tacatcaacaatgaactgcaggacttcaccaagacacagatgaagccaggcgtggtgcca 423 

IIMIMIII III IIIIIIMIIII III I 1 1 1 1 I Mill I IIM 
Qy 3922 tacatcaacagtgagctgcaggacttccagaaggtgccgatgcaaacaggcattttgcct 3981 



424 ggctgtgagccctgccgcaaactctactgtctacatggcatttgccagcccaacgccacc 483 

IMIIIIMM IIM III III Mill MINIMI I I 
3982 ggctgtgagccatgccacaagaaggtgtgtgcccatggcacatgccagcccagcagccag'4041 

484 ccagggcctgtgtgccactgtgaggctggctgggggggcctgcactgtgaccagccagtg 543 

MM III I II III II III Ml I I IMIIMM I 
4 042 gcaggcttcacctgcgagtgccaggaaggatggatggggcccctctgtgaccaacggacc 4101 

544 gatggcccctgccatggccacaagtgtgtccatgggaaatgcgtgccgctcgacgctctt 603 

Ml Ml Mil III I II II II lllll I III MM II I II I 
4102 aatgacccttgccttggaaataaatgcgtacatggcacctgcttgcccatcaatgcgttc 4161 



Db 604 gcctacagctgccagtgccaggatgg 629 

IIMIMIII Mill III II 
Qy 4162 tcctacagctgtaagtgcttggaggg 4187 



RESULT 3 

ID V16967 standard; cDNA to mRNA; 551 BP. 

AC V16967; 

DT 06-JUL-1998 (first entry) 

de Nucleic acid sequence of the specification. 

KW Slit-like protein; human; diagnosis; treatment; brain-specific disease; 

KW cancer; antibody; ds. 

OS Rattus sp. 

PN J10087699-A. 

PD 07 -APR- 1998. 

PF 15 -JUL- 1997; 205351. 

PR 16-JUL-1996; JP-186219. 

PA (ASAH ) ASAHI KASEI KOGYO KK. 

DR WPI; 98-267127/24. 

DR P-PSDB; W46968. 

PT Human Slit-like protein • useful for diagnosis and treatment of 

PT brain -specific diseases and cancers 

PS Disclosure; Page 37; 45pp; Japanese. 

CC The present sequence appears in the specification. The specification 

CC describes a novel human slit-like protein (the mature protein is claimed 

CC in Claim 1), The slit-like polypeptide is useful for diagnosis and 

CC treatment of brain-specific diseases and cancers. Antibodies directed 

CC against the protein, or its fragments can also be used for diagnosing 

CC cancer, 

SQ Sequence 551 BP; 121 A; 175 C; 149 G; 106 T; 

Query Match 3.6%; Score 171; DB 41; Length 551; 

Best Local Similarity 65.64; Pred, No. 2.32e-87; . 

Matches 360; Conservative 0; Mismatches 189; Indels 0; Gaps 0; 

Db 2 gggccatgtccgtgttagctacgacccaggcagctaccccagctctgctatctacagtgc 61 

III I III Mil lllll III I III I II lllll II MUM 
Qy 3624 ggggcgtgttcgtgccagctatgacaccggctctcatccagcttctgccatttacagtgt 3683 

Db 62 tgaaacaatcaacgatgggcagttccacacagttgagctggtgacctttgaccagatggt 121 

II llllllll Mill I lllllll II II II I MM II MM I 
Qy 3684 ggagacaatcaatgatggaaacttccacattgtggaactacttgccttggatcagagtct 3743 

Db 122 gaacctctccatcgatggtggcagccccatgaccatggacaactttggaaagcactacac 181 

I III I llllllll I Mill I Ml lllll MUM I Ml 
Qy 3744 ctctttgtccgtggatggtgggaaccccaaaatcatcactaacttgtcaaagcagtccac 3803 

Db 182 actcaacagtgaggcccccctctatgtgggagggatgcccgtggatgtgaactcagctgc 241 

II II III I II llllllll Mill lllll II I III II I 
Qy 3804 tctgaattttgactctccactctatgtaggaggcatgccagggaagagtaacgtggcatc 3863 

Db 242 cttccgcctgtggcagatcctcaatggcaccagcttccacggttgcatccggaatctata 301 

I MM II Ml II 1 1 1 1 1 1 1 1 1 1 M 1 1 IMIIMM M II 
Qy 3864 tctgcgccaggcccctgggcagaacggaaccagcttccacggctgcatccggaaccttta 3923 

Db 302 catcaacaacgaactgcaggacttcaccaagacacagatgaagccgggcgtggtgcccgg 361 

MINIM II IIIIIIMIIII Ml I MM I I III I MM II 
Qy 3924 catcaacagtgagctgcaggacttccagaaggtgccgatgcaaacaggcattttgcctgg 3983 

Db 362 ctgcgagccctgccgaaaactctactgtctacatggcatttgccagcccaacgccacccc 421 
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in inn mi ii in mini immiii i i i 

Qy 3984 ctgtgagccatgccacaagaaggtgtgtgcccatggcacatgccagcccagcagccaggc 4043 

Db 422 agggcccgtgtgccactgcgaggctggctgggggggcctgcactgtgaccagccagtgga 481 

III I III I III III II III III I I lllllllll I I 
Qy 4044 aggcttcacctgcgagtgccaggaaggatggatggggcccctctgtgaccaacggaccaa 4103 

Db 482 cggcccctgccatggccacaagtgtgtccatgggaaatgcgtgcccctcgatgcacttgc 541 

i iii 1 1 1 1 iii i ii ii ii inn i iii inn ii mi i i 

Oy 4104 tgacccttgccttggaaataaatgcgtacatggcacctgcttgcccatcaatgcgttctc 4163 

Db 542 ctacagctg 550 

1 1 ' I ! 1 1 1 : 
Qy 4164 ctacagctg 4172 



fULT 4 

I Q25811 standard; CDNA; 8378 BP. 
025811; 
05-JAN-1993 (first entry) 

DE Drosophila SLIT protein involved in axon pathway development. 

KW Neurogenesis; EGF-like repeats; epidermal growth factor; TAGON; 

KW embryonic CNS; leucine-rich repeat; Flank -LRR-Flank; glial cells; 

KW Notch; axonogenesis; cell-cell interaction; ss. 

OS Drosophila melanogaster. 

FH Key Location/Qualifiers 

FT 5'utr 1. .314 

FT /*tag- a 

FT Cds 315.. 4757 

FT /*tag- b 

FT /product- SLlT_protein 

FT 3'utr 4755.. 8378 

FT /*tag- c 

PN WO9210518-A. 

PD 25-JUN-1992. 

PF 27-NOV-1991; U09055. 

PR 07-DEC-1990; US-624135. 

PA (D7YA ) UNIV YALE. 
PI^-Artavanis-Tsakonas S, Rothberg JM; 

DR WPI; 92-234590/28. 

DR P-PSDB; R25079. 

PT SLIT protein and sequence elements for treating 

PT neuro-degenerative disease - useful for Alzheimer's disease, 

PT nerve damage and Parkinson's disease, for diagnosis of cancer 

PS Claim 2; Page 74-83; 122pp; English. 

•PCR and standard library screening techniques were used to further 
analyse previously isolated slit cDNA clones. A cDNA clone 
representing the 5' most 2,4kb of sequence was isolated from a 

CC larval library and PCR was used to isolate a corresp. sequence from 

CC a 4-8 hour embryonic library. Two forms of the SLIT message were 

CC identified differing by 33 nucleotides. Genomic and cDNA sequencing 

CC indicates the transcripts consists of an approximately 314bp 5' 

CC untranslated leader' sequence, followed by either a 4407bp or 4440bp 

CC ORF depending on the splice form and a 4kb 3'-(JTR. 

SO Sequence 8378 BP; 2192 A; 2272 C; 2164 G; 1742 T; 

Query Match 2.2%; Score 104; DB 4; Length 8378; 

Best Local Similarity 59.4%; Pred, No. 1.07e-44; 

Matches 330; Conservative 0; Mismatches 226; Indels 0; Gaps 0; 

Db 1196 ctgtccgcacccatgtcgctgtgcggacgggatcgtcgattgccgtgagaagagtctgac 1255 

iii ii i nil mi. ii inn ii ii mi iii mi ii 

Qy 828 ctgccctgccgcctgtacctgtagcaacaatatcgtagactgtcgtgggaaaggtctcac 887 

Db 1256 cagcgtgcccgtcaccttgcccgacgacaccaccgacgttcgcctcgagcaaaatttcat 1315 

Mil I I II II II III II I II I II II II II 
Qy 888 tgagatccccacaaatcttccagagaccatcacagaaatacgtttggaacagaacacaat 947 

Db 1316 tacggaactgccgccgaaatcgttctccagctttcgacgactgcgacgcatcgacctgtc 1375 

ii inn ii inn i i i ii im n nun 

Qy 948 caaagtcatccctcctggagctttctcaccatataaaaagcttagacgaattgacctgag 1007 
Db 1376 caacaacaacatatcccggattgcccacgatgcactaagcggcctaaagcagttaaccac 1435 



III II I II II Mil I lllll I It III III 
Qy 1008 caataatcagatctctgaacttgcaccagatgctttccaaggactacgctctctgaattc 1067 

Db 1436 tctcgtgctgtacggcaataaaataaaggatttaccctcgggcgtgttcaaaggactcgg 1495 

II II II II II MINI 1 1 I II ' Mil | | || Mill 
Qy 1068 acttgtcctctatggaaataaaatcacagaactccccaaaagtttatttgaaggactgtt 1127 

Db 1496 ctcgctcaggctgctgctgctgaacgccaacgagatctcgtgcatacgcaaggatgcctt 1555 

ii i iii ii i mi nun nn in i n inn n 

Qy 1128 ttccttacagctcctattattgaatgccaacaagataaactgccttcgggtagatgcttt 1187 
Db 1556 tcgcgacctgcacagtttgagcctgctctccctgtacgacaacaacatccagtcggtggc 1615 

ii ii ii mi im iii Minn n minn i in i i n 

Qy 1188 tcaggatctccacaacttgaaccttctctccctatatgacaacaagcttcagaccatcgc 1247 
Db 1616 taatggcacattcgacgccatgaagagcatgaaaacggtacatctggccaagaatccttt 1675 

ii ii ii it ill in mi i iii mil mi n n 

Qy 1248 caaggggaccttttcacctcttcgggccattcaaactatgcatttggcccagaacccctt 1307 
Db 1676 catctgcgactgcaatctgcgctgggtggccgactatttgcacaaaaatcccatagagac 1735 

ii n nun mi iii i ii ii iii mm ii n ii inn 

Qy 1308 tatttgtgactgccatctcaagtggctagcggattatctccataccaacccgattgagac 1367 

Db 1736 gagtggagcccgctgc 1751 

llll II III II II 
Qy 1368 cagtggtgcccgttgc 1383 



RESULT 5 

ID V57163 standard; DNA; 5617 BP. 

AC V57163; 

DT 06-JAN-1999 (first entry) 

DE Partial human Notch- 3 gene. 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 

KW developmental cascade; neurogenic gene; mutant; neurological disorder; 

KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 

KW leukoencephalopathy; therapy; ss. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT CDS 1. .5616 

FT /*tag» a 

FT /product- "partial human Notch-3" 

FT /transLexcept- (pos: 982.. 984, aa: xaa) 

FT /transLexcept- (pos; 1201,, 1203, aa; Xaa) 

FT /transl_except- (pos; 1207.. 1209, aa; Xaa) 

FT /transLexcept- (pos: 1216,. 1218, aa; Xaa) 

FT /transLexcept" (pos; 1225.. 1227, aa; Xaa) 

FT /transLexcept- (pos; 1258.. 1260, aa: Xaa) 

FT /transLexcept- (pos: 2116.. 2118, aa: Xaa) 

FT /transLexcept- (pos: 2182.. 2184, aa: Xaa) 

FT /transLexcept- (pos: 2275.. 2367, aa: Xaa) 

FT /transLexcept- (pos; 4273., 4275, aa: Xaa) 

FT /note- "Xaa - unknown; no start or stop codons are given 

FT at the 5' or 3' ends of the sequence" 

PN FR27519B5-A1. 

PD 06-FEB-1998, 

PF 01-AUG-1996; 009733. 

PR Ql-ADG-1996; FR-009733. 

PA (INRM ) INSERM INST NAT SANTE S RECH MEDICALE. 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133137/13. 

DR P-PSDB; W68510. 

PT Human Notch3 nucleic acids • and methods for identifying 

PT pre -disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig la-lg; 42pp; French. 

CC This sequence represents the sequence encoding a partial human notch3 

CC protein. Notch3 is a transmembrane receptor protein involved in lateral 

CC inhibition and regulating developmental cascades of neurogenic genes. 

CC Mutated Notch3 proteins are thought to be involved in neurological 

CC disorders, especially of the cerebral autosomal dominant arteriopathy 

CC with subcortical infarcts and leukoencephalopathy (CADASIL) type. 

CC Blocking expression of a mutated Notch3 gene or by substitution therapy 
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CC with non -mutated Notch3 gene or protein can be used to treat CADASIL or 
CC related disorders. 

SQ Sequence 5617 BP; 859 A; 1824 C; 1790 G; 1033 T; 



Query Match 1.1%; score 54; DB 51; Length 5617; 

Best Local Similarity 65.0%; Pred. No. 9.57e-15; 

Matches 117; Conservative 0; Mismatches 63; Indels 0; Gaps 0; 



Db 


480 


tgggtttgagggtcagaattgtgaagtgaacgtggacgactgtccaggacaccgatgtct 
III HIM II 1 lllllllllll HIM II II III II | III 
tggatttgaaggagaaaattgtgaagtcaacgttgatgattgtgaagataatgactgtga 


539 


Qy 


2952 


3011 


Db 


540 


caatggggggacatgcgtggatggcgtcaacacctataactgccagtgccctcctgagtg 

in inn ii iiiiM i ii i iii i mi inn mini 

aaataattctacatgtgtcgatggcattaataactacacatgcctttgcccacctgagta 


599 


Qy 


3012 


3071 


Db 


600 


gacaggccagttctgcacggaggacgtggatgagtgtcagctgcagcccaacgcctgcca 

inn ii ii ii mi i mi iii 1 1 1 1 in mini 

tacaggtgagttgtgtgaggagaagctggacttctgtgcccaggacctgaacccctgcca 


659 


Qy 


3072 


3131 



•LT 6 
V57001 standard; cDNA; 8091 BP. 
AC V57001; 

DT 21-DEC-1998 (first entry) 
DE Human Notch3 cDNA. 

KW Human; Notch3; transmembrane receptor; lateral inhibition; regulation; 
KW developmental cascade; neurogenic gene; mutant; neurological disorder; 
KW cerebral autosomal dominant arteriopathy; subcortical infarct; CADASIL; 
KW leukoencephalopathy; therapy; ss. 
OS Homo sapiens. 



FH Key Location/Qualifiers 

FT CDS 79.. 7044 

FT /*tag= a 

FT /product- "Notch3 protein" 



PN FR2751986-A1. 

PD 06-FEB-1998. 

PF 16-APR-1997; 004680. 

PR 01-AUG-1996; FR-009733. 

PA (INRM ) INSERM INST NAT SANTE & RECH MEDICALE, . 

PI Bach JF, Bousser MG, Joutel A, Tournier Lasserve E; 

DR WPI; 98-133138/13. 

DR P-PSDB; W49698, 

PT Human Notch3 nucleic acids - and methods for identifying 

PT pre-disposition to cerebral autosomal dominant arteriopathy with 

PT sub-cortical infarcts and leukoencephalopathy 

PS Claim 2; Fig 1.1-1.8; 45pp; French, 

CC This sequence represents the cDNA sequence encoding the human notch3 
protein, Notch3 is a transmembrane receptor protein involved in lateral 
inhibition and regulating developmental cascades of neurogenic genes, 

Wf Mutated Notch3 proteins are thought to be involved in neurological 

Cu disorders, especially of the cerebral autosomal dominant arteriopathy 

CC with subcortical infarcts and leukoencephalopathy (CADASIL) type, 

CC Blocking expression of a mutated Notch3 gene or by substitution therapy 

CC with non -mutated Notch3 gene or protein can be used to treat CADASIL or 

CC related disorders. 

SQ Sequence 8091 BP; 1311 A; 2716 C; 2488 G; 1576 T; 

Query Match 1.1%; Score 54; DB 50; Length 8091; 

Best Local Similarity 65,0%; Pred. No. 9.57e-15; 

Matches 117; Conservative 0; Mismatches 63; Indels 0; Gaps 0; 

Db 756 tgggtttgagggtcagaattgtgaagtgaacgtggacgactgtccaggacaccgatgtct 815 

m mn ii i iiiiiiiiiii inn n n m n i in 

Qy 2952 tggatttgaaggagaaaattgtgaagtcaacgttgatgattgtgaagataatgactgtga 3011 
Db 816 caatggggggacatgcgtggatggcgtcaacacctataactgccagtgccctcctgagtg 875 

in mil ii iiiiii i ii i in i nn inn mini 

Qy 3012 aaataattctacatgtgtcgatggcattaataactacacatgcctttgcccacctgagta 3071 

Db 876 gacaggccagttctgcacggaggacgtggatgagtgtcagctgcagcccaacgcctgcca 935 

HIM MM II llll I llll III I I I I III Mill 
Qy 3072 tacaggtgagttgtgtgaggagaagctggacttctgtgcccaggacctgaacccctgcca 3131 



RESDLT 7 

ID Q51746 standard; cDNA; 91 BP. 

AC Q51746; 

DT 31-MAY-1994 (first entry) 

DE Oligonucleotide probe MK14-A 

KW Oligonucleotide; DNA probe; mycobacteria; disease diagnosis; 

KW ss. 

OS. Synthetic. 

PN EP-571911-A. 

PD 01-DEC-1993. 

PF 24-MAM993; 108325. 

PR 26-MAY-1992; US-889651. 

PA (BECT ) BECTON DICKINSON CO. 

PI Shank DD, Spears PA; 

DR WPI; 93-378844/48. 

PT New oligonucleotide probes specific for Mycobacteria - used for 

PT detection and amplification of Mycobacteria nucleic acid in 

PT samples 

PS Claim 3; Page 14; 23pp; English. 

CC Oligonucleotide probe MK14-A consists of nucleotides 5-95 of MK14 

CC (Q51735). It hybridized to all spp. of mycobacteria tested, but 

CC cross reacted to a few non -mycobacterial spp. The probe may 

CC be useful as an initial screen for mycobacterial infection. 

CC See also Q51735-45 and Q51747-59. 

SQ Sequence 91 BP; 5 A; 17 C; 15 G; 4 T; 

Query Match 0,9%; Score 41; DB 9; Length 91; 

Best Local Similarity 3.8%; Pred. No. 1.20e-07; 

Matches 2; Conservative 45; Mismatches 6; indels 0; Gaps 0; 

Db 8 gcgssvhsyyvvhvvshhhsvhhvvhhvhvsvvvvhhvvhvvhhvhyhvyvsv 60 



Qy 140 gcagcgtgcccaggaatatcccccgcaacaccgagagactggatttaaatgga 192 



RESDLT 8 

ID Q51746 standard; cDNA; 91 BP. 

AC Q51746; 

DT 31-MAM994 (first entry) 

DE Oligonucleotide probe MK14-A 

KW Oligonucleotide; DNA probe; mycobacteria; disease diagnosis; 

KW ss. 

OS Synthetic. 

PN EP-571911-A, 

PD 01-DEC-1993, 

PF 24-MAY-1993; 108325. 

PR 26-MAY-1992; US-889651. 

PA (BECT ) BECTON DICKINSON CO, 

PI Shank DD, spears PA; 

DR WPI; 93-378844/48. 

PT New oligonucleotide probes specific for Mycobacteria • used for 

PT detection and amplification of Mycobacteria nucleic acid in 

PT samples 

PS Claim 3; Page 14; 23pp; English, 

CC Oligonucleotide probe MK14-A consists of nucleotides 5-95 of MK14 

CC (Q51735). It hybridized to all spp. of mycobacteria tested, but 

CC cross reacted to a few non -mycobacterial spp. The probe may 

CC be useful as an initial screen for mycobacterial infection. 

CC See also Q51735-45 and Q51747-59. 

SQ Sequence 91 BP; 5 A; 17 C; 15 G; 4 T; 

Query Match 0.9%; Score 43; DB 9; Length 91; 

Best Local Similarity 21.3%; Pred. No. 1.05e-0B; 

Matches 16; Conservative 43; Mismatches 16; Indels 0; Gaps 0; 

Db 3 ctccggcgssvhsyywhvvshhhsvhhvvhhvhvsvvwhhvvhvvhhvhyhvyvsvct 62 

II III I:::::::: :::: ::::::::::::: ::::::: : 

Cp 78 ctgcggtgccaccttgttcaggatcgccagcactaaccccagcgacagggacagcatctg 19 

Db 63 caagcctcggcggcg 77 

I llll II III 
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Cp 18 ccagccaacgccgcg 4 



RESULT 9 

ID N81164 standard; DNA; 204 BP, 

AC N81164; 

DT 08-NOV-1990 (first entry) 

DE Base substituted E.coli beta-galactosidase alpha-fragment. 

KW E.coli beta galactosidase alpha-fragment; base substitutions; ss, 

OS Escherichia coli. 

fh Key Location/Qualifiers 

FT misc.feature 19.. 69 

FT /*tag- a 

FT /function=multiple cloning site 

FT primer J>ind 187,. 204 

FT /*tag= b 

•EP-285123-A. 
05-MAY-1988, 
30-MAR-1988; 105163. 

PR 03-APR-1987; US-034819, 

PA (SUSO) SOOMEN SOKERI OY. 

PI Lehtovaara P, Knowles J, Koivula A, Bamford J, Reinikainen T; 

DR WPI; 88-279927/40. 

PT Introducing random point mutations into nucleic acods - 

PT by prepn of single stranded template, annealing a primer, elongation, 

PT misincorporation, completion of molecules and screening, 

PS Disclosure; p; English. 

CC Random point mutations were introduced into the alpha fragment of 

CC E.coli beta-galactosidase, The wild type sequence was obtained as a 

CC single stranded template and an oligonucleotide was hybridised to 

CC it to generate a popn of DNA molecules which terminate at all 

CC possible nucleotide positions within a specified region. The 

CC variable 3' ends generated in this way are used as primers for 

CC reverse transcriptase. Nucleotides are misincorporated by the 

CC transcriptase and the molecules are completed to forms that can be 

CC amplified and then expressed in a suitable host-vector system. 

CC The sequence covers all 176 difft base substitutions, most of which 

CC occurred singularly in any given mutant. 

CC See also P80575. 

SQ Sequence 204 BP; 21 A; 47 C; 17 6; 11 T; 108 Others; 

Query Match 0.9%; Score 43; DB 1; Length 204; 

Best Local Similarity 8.2%; Pred. No. 1.05e-08; 

Matches 8; Conservative 55; Mismatches 34; Indels 0; Gaps 0; 

92 hhyrrmrbnvyrdynrsdaaawyccyrrsvkydccynachhddhyvybbbvynvhnhnnc 151 

Ty 3296 ccgaaggttacagtggcttgttctgtgagttttctccacccatggtcctccctcgtacca 3355 



Db 152 ncccbnnhvchnvhbnnhrnwayvrhdarrddvhccv 188 
Qy 3356 gcccctgtgataattttgattgtcagaatggagctca 3392 



RESULT 10 

ID N81164 standard; DNA; 204 BP. 

AC N81164; 

DT 08-NOV-1990 (first entry) 

DE Base substituted E.coli beta-galactosidase alpha-fragment. 

KW E.coli beta galactosidase alpha- fragment; base substitutions; i 

OS Escherichia coli. 

FH Key Location/Qualifiers 

FT miscjeature 19.. 69 

FT /*tag= a 

FT /function-multiple cloning site 

FT primer J>ind 187.. 204 

FT /*tag- b 

PN EP-285123-A. 

PD 05-MAH988. 

PF 30-MAR-1988; 105163. 

PR 03-APR-1987; US-034819. 

PA (SUSO) SUOMEN SOKERI OY, 

PI Lehtovaara P, Knowles J, Koivula A, Bamford J, Reinikainen T; 



DR WPI; 88-279927/40. 

PT Introducing random point mutations into nucleic acods - 

PT by prepn of single stranded template, annealing a primer, elongation, 

PT misincorporation, completion of molecules and screening. 

PS Disclosure; p; English, 

CC Random point mutations were introduced into the alpha fragment of 

CC E.coli beta-galactosidase. The wild type sequence was obtained as a 

CC single stranded template and an oligonucleotide was hybridised to 

CC it to generate a popn of DNA molecules which terminate at all 

CC possible nucleotide positions within a specified region. The 

CC variable 3' ends generated in this way are used as primers for 

CC reverse transcriptase. Nucleotides are misincorporated by the 

CC transcriptase and the molecules are completed to forms that can be 

CC amplified and then expressed in a suitable host-vector system, 

CC The sequence covers all 176 difft base substitutions, most of which 

CC occurred singularly in any given mutant. 

CC See also P80575. 

SQ Sequence 204 BP; 21 A; 47 C; 17 G; 11 T; 108 Others; 

Query Match 0.9*; Score 44; DB 1; Length 204; 

Best Local Similarity 17,64; Pred. No. 3. 08e- 09; 



Matches 


Db 


16 


Cp 


4573 


Db 


76 


Cp 


4513 


Db 


136 


Cp 


4454 



I: I I::: |:: hill :| I |: 



II II ::| II II : 



::| ::::::: I : |: 



I I : : I: ::: 



RESULT 11 

ID T24530 standard; cDNA to mRNA; 341 BP. 

AC T24530; 

DT 25-SEP-1996 (first entry) 

DE Human gene signature HUMGS06577. 

KW Gene signature; messenger RNA; mRNA; relative abundance; frequency; 

KW human; cloning; mapping; non-biased library; diagnosis; detection; 

KW cell typing; abnormal cell function; ss. 

OS Homo sapiens, 

PN W09514772-A1. 

PD 01-JUN-1995. 

PF ll-NOV-1994; J01916. 

PR 12-NOV-1993; JP-355504, 

PA (MATS/) MATSUBARA K. 

PA (OKtIB/) OKOBO K. 

PI Matsubara K, Okubo K; 

DR WPI; 95-206931/27. ' , 

PT Identifying gene signatures in 3' -directed human cDNA library - e.g. 

PT for diagnosis of abnormal cell function, by preparing cDNA that 

PT reflects relative abundance of corresp. mRNA in specific human 

PT tissues 

PS Claim 1; Page 1632; 2245pp; Japanese. 

CC A single-stranded DNA (or its complementary strand or the corresp, 

CC double-stranded DNA) which comprises one of the 7837 "GS" sequences 

CC given in T19001-T26837 and which is able to hybridise to part of 

CC human genomic DNA, cDNA or mRNA is claimed. The GS (Gene Signature) 

CC sequences were obtained from 3 '-directed cDNA libraries prepared 

CC from various human tissues; synthesis of cDNA was initiated from the 

CC 3 ' -end of mRNA by using poly(T) as the sole primer. Since the 3'- 

CC untranslated sequence is unique to a particular mRNA species, almost 

CC all the 3' -oriented cDNAs hybridise with specific mRNAs. Each library 

CC is constructed so as to reflect accurately the relative abundance of 

CC different mRNAs in the particular tissue from which it was derived. 

CC The appearance frequency of a given GS in a cDNA library can be 

CC determined (esp, using primers and probes derived from the GS 

CC sequences) as a means of diagnosing abnormal cell function or for 

CC recognising different cell types, 
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Sequence 341 BP; 



73 A; 



102 6; 68 T; 



Query Match 0.9%; Score 45; DB 21; Length 341; 

Best Local Similarity 75.91; Pred. No. 8.92e-10; 

Matches 63; Conservative 0; Mismatches 20; Indels 0; Gaps 0; 

Db 103 agcaagcgncggaaatacgtcttccagtgcacggntggctcctcgtttgtagaagaggtg 162 

mini! iimim m i n; i mum nm M mi; 

Qy 4477 agcaagcggcggaaatactctttcgaatgcactgacggctcctcctttgtggacgaggtt 4536 

Db 163 gagagacacttagagtgcggctg 185 

Mil I I llllllllll 
Qy 4537 gagaaagtggtgaagtgcggctg 4559 



RESULT 12 

ID Q70468 standard; DNA; 114 BP, 

AC Q70468; 

DT 05-APR-1995 (first entry) 

DE Generic DNA sequence to generate a random TSAR petide library. 

§TSAR; totally synthetic affinity reagent; synthetic; binding domain; 
effector domain; concateneated heterofunctional protein; linker; 
direct; rapid; detection; screening; treatment; generic; ss. 

OS Synthetic. 

FH Key Location/Qualifiers 

FT miscjeature 55.. 60 
FT /*tag- a 

FT /note- "this sequence represents 'z'; z can be a 

FT sequence of 6, 9 or 12 nucleotides (see 

FT ( comments)" 

PN W09418318-A. 

PD 18-AUG-1994. 

PF 01-FEB-1994; 000977, 

PR 01-FEB-1993; OS-013416, 

PR 30-DEC-1993; US-176500. 

PR 3WAN-1994; OS-189331. 

PA (UYNC-) UNIV NORTH CAROLINA, 

PI Fowlkes DM, Kay BK; 

DR WPI; 94-279739/34. 

DR P-PSDB; R65154, 

PT Identifying proteins or peptide(s) which bind a ligand • by 

PT screening a recombinant vector library expressing fusion proteins 

PT comprising a binding domain and an effector domain 

PS Disclosure; Page 35; 255pp; English, 

CC Q70468 is a generic DNA sequence used to generate random TSAR (Totally 

CC Synthetic Affinity Reagents) peptides. This generic formula can also be 

CC represented as follows: X(NNB)11(TGC)(NNB)6Z(NNB)7(TGC)(NNB)10Y. X 

CC and Y are flanking restriction sites (X is not the same as Y) that are 

♦not specified further. Other generic sequences are shown in Q70466-68. 
Other specific peptides generated by these generic sequences are shown in 
R65151-54. TSARs are concatenated heterofunctional proteins or peptides, 
comprising at least two functional regions - a binding domain with 

CC affinity for a ligand and a second effector peptide portion that is 

CC chemically or biologically active, They may further comprise a linker 

CC peptide between the 2 domains, The oligonucleotides are also designed so 

CC that the expressed peptide contains 2 or 4 cysteine residues positioned 

CC in, or flanking, the unpredicted or variant residues. These residues 

CC confer some degree of conformational rigidity to the peptides, The TSARs 

CC or compsns. comprising a TSAR binding domain can be used in vivo to 

CC deliver a chemically or biologically active moiety, eg. metal ion, 

CC radioisotope, peptide, toxin or enzyme, to the specific target or on the 

CC cell, They can also replace the function of macromolecules, eg. 

CC monoclonal or polyclonal antibodies and therefore circumvent the need 

CC for complex methods of hybridoma formation or in vivo antibody 

CC production, The TSARs are easily characterised and have designed activity 

CC allowing direct and rapid detection in a screening process, 

SQ Sequence 114 BP; 0 A; 2 C; 2 G; 2 T; 

Query Match 0,8%; Score 37; DB 12; Length 114; 

Best Local Similarity 4.6%; Pred. No. 1.41e-05; 

Matches 5; Conservative 33; Mismatches 71; Indels 0; Gaps 0; 

Db 6 bnnbnnbnnbnnbnnbnnbnnbnnbnnbtgcnnbnnbnnbnnbnnbnnbnnnnnnnnbnn 65 



Cp 153 cctgggcacgctgcgcagcgccagcccgtgacagtccactgtgctgcccgagcaagagca 94 
Db 66 bnnbnnbnnbnnbnnbtgcnnbnnbnnbnnbnnbnnbnnbnnbnnbnnb 114 
Cp 93 ctgcgccgggcacgcctgcggtgccaccttgttcaggatcgccagcact 45 



RESULT 13 

ID Q70469 standard; DNA; 114 BP. 

AC Q70469; 

DT 07-APR-1995 (first entry) 

de Generic DNA sequence to generate a random TSAR peptide library. 

KW TSAR; totally synthetic affinity reagent; synthetic; binding domain; 

KW effector domain; concateneated heterofunctional protein; linker; 

KW direct; rapid; detection; screening; treatment; generic; ss, 

OS Synthetic. 

FH Key Location/Qualifiers 

FT miscjeature 55., 60 

FT /*tag- a 

FT /note- "this sequence represents 'Z'; Z can be a 

FT sequence of 6,9 or 12 nucleotides (see 

FT comments)" 

PN W09418318-A. 

PD 18-AUG-1994. 

PF 01-FEB-1994; U00977. 

PR 01-FEB-1993; US-013416. 

PR 3Q-DEC-1993; US-176500, 

PR 31-JAN-1994; OS-189331. 

PA (UYNC-) UNIV NORTH CAROLINA. 

PI Fowlkes DM, Kay BK; 

DR WPI; 94-279739/34. 

PT Identifying proteins or peptide(s) which bind a ligand - by 

PT screening a recombinant vector library expressing fusion proteins 

PT comprising a binding domain and an effector domain 

PS Disclosure; Page 35; 255pp; English, 

CC Q70469 is a generic DNA sequence used to generate random TSAR peptide 

CC This generic formula can be represented as follows: X(TGC)(NNB)10- 

CC (TGC)(NNB)6Z(NNB)2(TGC)(NNB)14(TGC)Y. X and Y are flanking restriction 

CC sites (X is not the same as Y) that are not specified further. This 

CC sequence generates peptides that are cloverleaf in structure. Other 

CC generic sequences are shown in Q70465-68. Other specific peptides 

CC generated by these generic sequences are shown in R65150-54. TSARs are 

CC concatenated heterofunctional proteins or peptides, comprising at least 

CC two functional regions - a binding domain with affinity for a ligand and 

CC a second effector peptide portion that is chemically or biologically 

CC active. They may further comprise a linker peptide between the 2 domains. 

CC. The oligonucleotides are also designed so that the expressed peptide 

CC contains 2 or 4 cysteine residues positioned in, or flanking, the 

CC unpredicted or variant residues, These residues confer some degree of 

CC conformational rigidity to the peptides, The TSARs or compsns. comprising 

CC a TSAR binding domain can be used in vivo to deliver a chemically or 

CC biologically active moiety, eg. metal ion, radioisotope, peptide, toxin 

CC or enzyme, to the specific target or on the cell. They can also replace 

CC the function of macromolecules, eg, monoclonal or polyclonal antibodies 

CC and therefore circumvent the need for complex methods of hybridoma 

CC formation or in vivo antibody production. The TSARs are easily 

CC characterised and have designed activity allowing direct and rapid 

CC detection in a screening process. 

SQ Sequence 114 BP; 0 A; 4 C; 4 G; 4 T; 

Query Match 0.8%; Score 36; DB 12; Length 114; 

Best Local Similarity 4,7%; Pred. No. 4.52e-05; 

Matches 5; Conservative 32; Mismatches 69; Indels 0; Gaps 0; 

Db 6 bnnbnnbnnbnnbnnbnnbnnbnnbnnbtgcnnbnnbnnbnnbnnbnnbnnnnnnnnbnn 65 

Cp 153 cctgggcacgctgcgcagcgccagcccgtgacagtccactgtgctgcccgagcaagagca 94 

Db 66 btgcnnbnnbnnbnnbnnbnnbnnbnnbnnbnnbnnbnnbnnbnnb 111 

:||| :::::::::::::: 

Cp 93 ctgcgccgggcacgcctgcggtgccaccttgttcaggatcgccagc 48 
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RESULT 14 

ID V44650 standard; DNA; 91 BP. 

AC V44650; 

DT 06-OCT-1998 (first entry) 

DE Mammalian DNA replication origin consensus sequence, uniorsconsensus . 

KW DNA replication origin; human; mammal; alphaconsensus; uniorsconsensus; 

KW anti-gene; DNA replication inhibitor; shuttle vector construct creation; 

KW gene therapy; ss, 

OS Mammalia. 

PN WO982720Q-A2. 

PD 25-JUN-1998. 

PF 12-DEC-1997; CA0972. 

PR 21-MAY-1997; US-047322, 

PR 16-DEC-1996; US-033374 . 

PA (UYMC-) UNIV MCGILL. 

♦ Cossons NH, Nielsen TO, Price GB, Zannis-Hadjopoulos M; 
WPI; 98-362770/31. 
Human or mammalian origin of replication consensus sequences - for 

PT inhibiting DNA replication, for controlling initiation of 

PT replication, maintaining circular plasmids and in assembly of human 

PT artificial chromosomes 

PS Claim 1; Page 42; 54pp; English. 

CC This sequence represents a human or mammalian DNA replication origin 

CC consensus sequences of the invention, designated uniorsconsensus. 

CC Administration of the consensus sequence or an anti-gene (comprising a 

CC double stranded copy of the consensus) is used to inhibit DNA replication 

CC in vivo or in vitro. The consensus sequences can also be inserted into an 

CC expression vector, used subsequently for in vitro transfection of 

CC mammalian cells, to control initiation of DNA replication. They can also 

CC be used used to maintain circular plasmids that are capable of 

CC semi -conservative replication in proliferating mammalian cells, or 

CC inserted into mammalian or human artificial chromosome vectors for gene 

CC therapy. Particularly, they are used to create shuttle vector constructs 

CC for defining the essential mammalian elements required for maintenance of 

CC chromosomal function . The consensus sequence can be combined with cloned 

CC human telomeres and large centromeric blocks for assembly of human 

CC artificial chromosomes and maintained as bacterial plasmids, circular or 

CC linear, large or small yeast artificial chromosomes (YACs) or as episomal 

CC elements. 

SQ Sequence 91 BP; 15 A; 1 C; 4 G; 7 T; 

Query Match 0.7%; Score 32; DB 46; Length 91; 

Best Local Similarity 13,8%; Pred. No. 4.23e-03; 
patches 11; Conservative 45; Mismatches 24; Indels 0; Gaps 0; 

5 waakrawrwwkkdavwwgakrwwkwvwhrassacmdwkaaktwkggwtwarrywkgrkmw 64 
W ;!!:: :::: :|::: :::|: | ::;;| :|:: : : ::;;!; :; 

Oy 4679 aaatacagaacagacttatttttattatgagaataaagactttttttctgcatttggaaa 4738 

Db 65 wtwkawsdatakwwkdakw 84 

: : I: :| I ::: : 
Qy 4739 aaaaaaaaaaaaaaactcga 4758 



PT inhibiting DNA replication, for controlling initiation of 

PT replication, maintaining circular plasmids and in assembly of human 

PT artificial chromosomes 

PS Claim 1; Page 42; 54pp; English, 

CC This sequence represents a human or mammalian DNA replication origin 

CC consensus sequences of the invention, designated uniorsconsensus. 

CC Administration of the consensus sequence or an anti-gene (comprising a 

CC double stranded copy of the consensus) is used to inhibit DNA replication 

CC in vivo or in vitro. The consensus sequences can also be inserted into an 

CC expression vector, used subsequently for in vitro transfection of 

CC mammalian cells, to control initiation of DNA replication. They can also 

CC be used used to maintain circular plasmids that are capable of 

CC semi-conservative replication in proliferating mammalian cells, or 

CC inserted into mammalian or human artificial chromosome vectors for gene 

CC therapy. Particularly, they are used to create shuttle vector constructs 

CC for defining the essential mammalian elements required for maintenance of 

CC chromosomal function. The consensus sequence can be combined with cloned 

CC human telomeres and large centromeric blocks for assembly of human 

CC artificial chromosomes and maintained as bacterial plasmids, circular or 

CC linear, large or small yeast artificial chromosomes (YACs) or as episomal 

CC elements , 

SQ Sequence 91 BP; 15 A; 1 C; 4 G; 7 T; 

Query Match 0.7%; Score 31; DB 46; Length 91; 

Best Local Similarity 9,1%; Pred. No. 1.27e-02; 

Matches 8; Conservative 54; Mismatches 25; Indels 1; Gaps 1; 

Db 1 awmtwaakrawrwwkkdavwwgakrwwkwvwhrassacmdwkaaktwkggwtwarrywkg 60 

|::|: :: ::::::: :::| ::: ::::: :| :::: |: : :|::::: 

Cp 1224 atatagggagagaaggttcaagttgtggagatcctga-aaagcatctacccgaaggcagt 1166 

Db 61 rkmwwtwkawsdatakwwwkdakwkmwr 88 

Cp 1165 ttatcttgttggcattcaataataggag 1138 



Search completed: Sat May 29 23:08:07 1999 
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RESULT 15 

ID V44650 standard; DNA; 91 BP. 

AC V44650; 

DT 06-OCT-1998 (first entry) 

DE Mammalian DNA replication origin consensus sequence, uniorsconsensus. 

KW DNA replication origin; human; mammal; alphaconsensus; uniorsconsensus; 

KW anti-gene; DNA replication inhibitor; shuttle vector construct creation; 

KW gene therapy; ss. 

OS Mammalia. 

PN WO9827200-A2. 

PD 25-JUN-1998. 

PF 12-DEC-1997; CA0972. 

PR 21-MAY-1997; US"047322. 

PR 16-DEC-1996; US-033374, 

PA (UYMC-) UNIV McGILL. 
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